Menu
July 7, 2019

Role of restriction-modification systems in prokaryotic evolution and ecology

Restriction–modification (R-M) systems are able to methylate or cleave DNA depending on methylation status of their recognition site. It allows them to protect bacterial cells from invasion by foreign DNA. Comparative analysis of a large number of available bacterial genomes and methylomes clearly demonstrates that the role of R-M systems in bacteria is wider than only defense. R-M systems maintain heterogeneity of a bacterial population and are involved in adaptation of bacteria to change in their environmental conditions. R-M systems can be essential for host colonization by pathogenic bacteria. Phase variation and intragenomic recombinations are sources of the fast evolution of the specificity of R-M systems. This review focuses on the influence of R-M systems on evolution and ecology of prokaryotes.


July 7, 2019

Single molecule sequencing of THCA synthase reveals copy number variation in modern drug-type Cannabis sativa L.

Cannabinoid expression is an important genetically determined feature of cannabis that presents clinical and legal implications for patients seeking cannabinoid specific therapies like Cannabidiol (CBD). Cannabinoid, terpenoid, and flavonoid marker assisted selection can accelerate breeding efforts by offering genetic tools to select for desired traits at an early stage in growth. To this end, multiple models for chemotype inheritance have been described suggesting a complex picture for chemical phenotype determination. Here we explore the potential role of copy number variation of THCA Synthase using phased single molecule sequencing and demonstrate that copy number and sequence variation of this gene is common and suggests a more nuanced view of chemotype prediction.


July 7, 2019

Dubowitz syndrome is a complex comprised of multiple, genetically distinct and phenotypically overlapping disorders.

Dubowitz syndrome is a rare disorder characterized by multiple congenital anomalies, cognitive delay, growth failure, an immune defect, and an increased risk of blood dyscrasia and malignancy. There is considerable phenotypic variability, suggesting genetic heterogeneity. We clinically characterized and performed exome sequencing and high-density array SNP genotyping on three individuals with Dubowitz syndrome, including a pair of previously-described siblings (Patients 1 and 2, brother and sister) and an unpublished patient (Patient 3). Given the siblings’ history of bone marrow abnormalities, we also evaluated telomere length and performed radiosensitivity assays. In the siblings, exome sequencing identified compound heterozygosity for a known rare nonsense substitution in the nuclear ligase gene LIG4 (rs104894419, NM_002312.3:c.2440C>T) that predicts p.Arg814X (MAF:0.0002) and an NM_002312.3:c.613delT variant that predicts a p.Ser205Leufs*29 frameshift. The frameshift mutation has not been reported in 1000 Genomes, ESP, or ClinSeq. These LIG4 mutations were previously reported in the sibling sister; her brother had not been previously tested. Western blotting showed an absence of a ligase IV band in both siblings. In the third patient, array SNP genotyping revealed a de novo ~ 3.89 Mb interstitial deletion at chromosome 17q24.2 (chr 17:62,068,463-65,963,102, hg18), which spanned the known Carney complex gene PRKAR1A. In all three patients, a median lymphocyte telomere length of = 1st centile was observed and radiosensitivity assays showed increased sensitivity to ionizing radiation. Our work suggests that, in addition to dyskeratosis congenita, LIG4 and 17q24.2 syndromes also feature shortened telomeres; to confirm this, telomere length testing should be considered in both disorders. Taken together, our work and other reports on Dubowitz syndrome, as currently recognized, suggest that it is not a unitary entity but instead a collection of phenotypically similar disorders. As a clinical entity, Dubowitz syndrome will need continual re-evaluation and re-definition as its constituent phenotypes are determined.


July 7, 2019

Pseudoautosomal region 1 length polymorphism in the human population.

The human sex chromosomes differ in sequence, except for the pseudoautosomal regions (PAR) at the terminus of the short and the long arms, denoted as PAR1 and PAR2. The boundary between PAR1 and the unique X and Y sequences was established during the divergence of the great apes. During a copy number variation screen, we noted a paternally inherited chromosome X duplication in 15 independent families. Subsequent genomic analysis demonstrated that an insertional translocation of X chromosomal sequence into theMa Y chromosome generates an extended PAR. The insertion is generated by non-allelic homologous recombination between a 548 bp LTR6B repeat within the Y chromosome PAR1 and a second LTR6B repeat located 105 kb from the PAR boundary on the X chromosome. The identification of the reciprocal deletion on the X chromosome in one family and the occurrence of the variant in different chromosome Y haplogroups demonstrate this is a recurrent genomic rearrangement in the human population. This finding represents a novel mechanism shaping sex chromosomal evolution.


July 7, 2019

The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line.

The HeLa cell line was established in 1951 from cervical cancer cells taken from a patient, Henrietta Lacks. This was the first successful attempt to immortalize human-derived cells in vitro. The robust growth and unrestricted distribution of HeLa cells resulted in its broad adoption–both intentionally and through widespread cross-contamination–and for the past 60?years it has served a role analogous to that of a model organism. The cumulative impact of the HeLa cell line on research is demonstrated by its occurrence in more than 74,000 PubMed abstracts (approximately 0.3%). The genomic architecture of HeLa remains largely unexplored beyond its karyotype, partly because like many cancers, its extensive aneuploidy renders such analyses challenging. We carried out haplotype-resolved whole-genome sequencing of the HeLa CCL-2 strain, examined point- and indel-mutation variations, mapped copy-number variations and loss of heterozygosity regions, and phased variants across full chromosome arms. We also investigated variation and copy-number profiles for HeLa S3 and eight additional strains. We find that HeLa is relatively stable in terms of point variation, with few new mutations accumulating after early passaging. Haplotype resolution facilitated reconstruction of an amplified, highly rearranged region of chromosome 8q24.21 at which integration of the human papilloma virus type 18 (HPV-18) genome occurred and that is likely to be the event that initiated tumorigenesis. We combined these maps with RNA-seq and ENCODE Project data sets to phase the HeLa epigenome. This revealed strong, haplotype-specific activation of the proto-oncogene MYC by the integrated HPV-18 genome approximately 500?kilobases upstream, and enabled global analyses of the relationship between gene dosage and expression. These data provide an extensively phased, high-quality reference genome for past and future experiments relying on HeLa, and demonstrate the value of haplotype resolution for characterizing cancer genomes and epigenomes.


July 7, 2019

Strobe sequence design for haplotype assembly.

Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and detecting cis-regulatory events. As the pair of chromosomes are mostly identical to each other, linking together of alleles at heterozygous sites is sufficient to phase, or separate the two chromosomes. In Haplotype Assembly, the linking is done by sequenced fragments that overlap two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes via assembly, relatively little work has been done on designing sequencing experiments to get long haplotypes. Here, we describe the different design parameters that can be adjusted with next generation and upcoming sequencing technologies, and study the impact of design choice on the length of the haplotype.We show that a number of parameters influence haplotype length, with the most significant one being the advance length (distance between two fragments of a clone). Given technologies like strobe sequencing that allow for large variations in advance lengths, we design and implement a simulated annealing algorithm to sample a large space of distributions over advance-lengths. Extensive simulations on individual genomic sequences suggest that a non-trivial distribution over advance lengths results a 1-2 order of magnitude improvement in median haplotype length.Our results suggest that haplotyping of large, biologically important genomic regions is feasible with current technologies.


July 7, 2019

Heterogeneous resistance to quizartinib in acute myeloid leukemia revealed by single-cell analysis.

Genomic studies have revealed significant branching heterogeneity in cancer. Studies of resistance to tyrosine kinase inhibitor therapy have not fully reflected this heterogeneity because resistance in individual patients has been ascribed to largely mutually exclusive on-target or off-target mechanisms in which tumors either retain dependency on the target oncogene or subvert it through a parallel pathway. Using targeted sequencing from single cells and colonies from patient samples, we demonstrate tremendous clonal diversity in the majority of acute myeloid leukemia (AML) patients with activating FLT3 internal tandem duplication mutations at the time of acquired resistance to the FLT3 inhibitor quizartinib. These findings establish that clinical resistance to quizartinib is highly complex and reflects the underlying clonal heterogeneity of AML.© 2017 by The American Society of Hematology.


July 7, 2019

Resolving multicopy duplications de novo using polyploid phasing

While the rise of single-molecule sequencing systems has enabled an unprecedented rise in the ability to assemble complex regions of the genome, long segmental duplications in the genome still remain a challenging frontier in assembly. Segmental duplications are at the same time both gene rich and prone to large structural rearrangements, making the resolution of their sequences important in medical and evolutionary studies. Duplicated sequences that are collapsed in mammalian de novo assemblies are rarely identical; after a sequence is duplicated, it begins to acquire paralog-specific variants. In this paper, we study the problem of resolving the variations in multicopy, long segmental duplications by developing and utilizing algorithms for polyploid phasing. We develop two algorithms: the first one is targeted at maximizing the likelihood of observing the reads given the underlying haplotypes using discrete matrix completion. The second algorithm is based on correlation clustering and exploits an assumption, which is often satisfied in these duplications, that each paralog has a sizable number of paralog-specific variants. We develop a detailed simulation methodology and demonstrate the superior performance of the proposed algorithms on an array of simulated datasets. We measure the likelihood score as well as reconstruction accuracy, i.e., what fraction of the reads are clustered correctly. In both the performance metrics, we find that our algorithms dominate existing algorithms on more than 93% of the datasets. While the discrete matrix completion performs better on likelihood score, the correlation-clustering algorithm performs better on reconstruction accuracy due to the stronger regularization inherent in the algorithm. We also show that our correlation-clustering algorithm can reconstruct on average 7.0 haplotypes in 10-copy duplication datasets whereas existing algorithms reconstruct less than one copy on average.


July 7, 2019

Morphological and genetic analyses of the invasive forest pathogen Phytophthora austrocedri reveal two clonal lineages colonised Britain and Argentina from a common ancestral population.

Phytophthora austrocedri is causing widespread mortality of Austrocedrus chilensis in Argentina and Juniperus communis in Britain. The pathogen has also been isolated from J. horizontalis in Germany. Isolates from Britain, Argentina and Germany are homothallic with no clear differences in the dimensions of sporangia, oogonia or oospores. Argentinian and German isolates grew faster than British isolates across a range of media and had a higher temperature tolerance although most isolates regardless of origin grew best at 15°C and all isolates were killed at 25°C. Argentinian and British isolates caused lesions on both hosts when inoculated onto A. chilensis and J. communis; however the Argentinian isolate caused longer lesions on A. chilensis than on J. communis and vice versa for the British isolate. Genetic analyses of nuclear and mitochondrial loci showed that all British isolates are identical. Argentinian isolates and the German isolate are also identical but differ from the British isolates. Single nucleotide polymorphisms are shared between the British and Argentinian isolates. It is concluded that British isolates and Argentinian isolates conform to two distinct clonal lineages of P. austrocedri founded from the same as-yet unidentified source population. These lineages should be recognised and treated as separate risks by international plant health legislation.


July 7, 2019

Archetype JC polyomavirus prevails in a rare case of JC polyomavirus nephropathy and in stable renal transplant recipients with JC polyomavirus viruria.

JC polyomavirus (JCPyV) is reactivated in approximately 20% of renal transplant recipients and it may rarely cause JCPyV-associated nephropathy (JCPyVAN). Whereas progressive multifocal leukoencephalopathy of the brain is caused by rearranged neurotropic JCPyV, little is known about viral sequence variation in JCPyVAN due to the rarity of this condition.Using single-molecule real-time sequencing, characterization of full-length JCPyV genomes from urine and plasma of one JCPyVAN patient and twenty stable renal transplant recipients with JCPyV viruria was attempted. Sequence analysis of JCPyV strains was performed with the emphasis on the NCCR region, the major capsid protein gene VP1 and the large T antigen (LTag) gene.Exclusively archetype strains were identified in urine of the JCPyVAN patient. Full-length JCPyV sequences were not retrieved from plasma. Archetype strains were found in urine of nineteen stable renal transplant recipients, with JCPyV quasispecies detected in five samples. In a patient with minor graft dysfunction, a strain with archetype-like NCCR region was discovered. Individual point mutations were detected in both VP1 and LTag genes.Archetype JCPyV was dominant in the JCPyVAN patient and in stable renal transplant recipients. Archetype rather than rearranged JCPyV seems to drive the pathogenesis of JCPyVAN.


July 7, 2019

Dense and accurate whole-chromosome haplotyping of individual genomes.

The diploid nature of the human genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. This lack of haplotype-level analyses can be explained by a lack of methods that can produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single-cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. We provide comprehensive guidance on the required sequencing depths and reliably assign more than 95% of alleles (NA12878) to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different technologies represents an attractive solution to chart the genetic variation of diploid genomes.


July 7, 2019

Estimating fitness of viral quasispecies from next-generation sequencing data.

The quasispecies model is ubiquitous in the study of viruses. While having lead to a number of insights that have stood the test of time, the quasispecies model has mostly been discussed in a theoretical fashion with little support of data. With next-generation sequencing (NGS), this situation is changing and a wealth of data can now be produced in a time- and cost-efficient manner. NGS can, after removal of technical errors, yield an exceedingly detailed picture of the viral population structure. The widespread availability of cross-sectional data can be used to study fitness landscapes of viral populations in the quasispecies model. This chapter highlights methods that estimate the strength of selection in selective sweeps, assesses marginal fitness effects of quasispecies, and finally infers the fitness landscape of a viral quasispecies, all on the basis of NGS data.


July 7, 2019

HapCol: accurate and memory-efficient haplotype assembly from long reads.

Haplotype assembly is the computational problem of reconstructing haplotypes in diploid organisms and is of fundamental importance for characterizing the effects of single-nucleotide polymorphisms on the expression of phenotypic traits. Haplotype assembly highly benefits from the advent of ‘future-generation’ sequencing technologies and their capability to produce long reads at increasing coverage. Existing methods are not able to deal with such data in a fully satisfactory way, either because accuracy or performances degrade as read length and sequencing coverage increase or because they are based on restrictive assumptions.By exploiting a feature of future-generation technologies-the uniform distribution of sequencing errors-we designed an exact algorithm, called HapCol, that is exponential in the maximum number of corrections for each single-nucleotide polymorphism position and that minimizes the overall error-correction score. We performed an experimental analysis, comparing HapCol with the current state-of-the-art combinatorial methods both on real and simulated data. On a standard benchmark of real data, we show that HapCol is competitive with state-of-the-art methods, improving the accuracy and the number of phased positions. Furthermore, experiments on realistically simulated datasets revealed that HapCol requires significantly less computing resources, especially memory. Thanks to its computational efficiency, HapCol can overcome the limits of previous approaches, allowing to phase datasets with higher coverage and without the traditional all-heterozygous assumption. Our source code is available under the terms of the GNU General Public License at http://hapcol.algolab.eu/.bonizzoni@disco.unimib.itSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

OxyR-dependent formation of DNA methylation patterns in OpvABOFF and OpvABON cell lineages of Salmonella enterica.

Phase variation of the Salmonella enterica opvAB operon generates a bacterial lineage with standard lipopolysaccharide structure (OpvAB(OFF)) and a lineage with shorter O-antigen chains (OpvAB(ON)). Regulation of OpvAB lineage formation is transcriptional, and is controlled by the LysR-type factor OxyR and by DNA adenine methylation. The opvAB regulatory region contains four sites for OxyR binding (OBSA-D), and four methylatable GATC motifs (GATC1-4). OpvAB(OFF) and OpvAB(ON) cell lineages display opposite DNA methylation patterns in the opvAB regulatory region: (i) in the OpvAB(OFF) state, GATC1 and GATC3 are non-methylated, whereas GATC2 and GATC4 are methylated; (ii) in the OpvAB(ON) state, GATC2 and GATC4 are non-methylated, whereas GATC1 and GATC3 are methylated. We provide evidence that such DNA methylation patterns are generated by OxyR binding. The higher stability of the OpvAB(OFF) lineage may be caused by binding of OxyR to sites that are identical to the consensus (OBSA and OBSc), while the sites bound by OxyR in OpvAB(ON) cells (OBSB and OBSD) are not. In support of this view, amelioration of either OBSB or OBSD locks the system in the ON state. We also show that the GATC-binding protein SeqA and the nucleoid protein HU are ancillary factors in opvAB control.© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.


July 7, 2019

Read-based phasing of related individuals.

Read-based phasing deduces the haplotypes of an individual from sequencing reads that cover multiple variants, while genetic phasing takes only genotypes as input and applies the rules of Mendelian inheritance to infer haplotypes within a pedigree of individuals. Combining both into an approach that uses these two independent sources of information-reads and pedigree-has the potential to deliver results better than each individually.We provide a theoretical framework combining read-based phasing with genetic haplotyping, and describe a fixed-parameter algorithm and its implementation for finding an optimal solution. We show that leveraging reads of related individuals jointly in this way yields more phased variants and at a higher accuracy than when phased separately, both in simulated and real data. Coverages as low as 2× for each member of a trio yield haplotypes that are as accurate as when analyzed separately at 15× coverage per individual.https://bitbucket.org/whatshap/whatshapt.marschall@mpi-inf.mpg.de.© The Author 2016. Published by Oxford University Press.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.