Menu
July 7, 2019

Strategies for optimizing BioNano and Dovetail explored through a second reference quality assembly for the legume model, Medicago truncatula.

Third generation sequencing technologies, with sequencing reads in the tens- of kilo-bases, facilitate genome assembly by spanning ambiguous regions and improving continuity. This has been critical for plant genomes, which are difficult to assemble due to high repeat content, gene family expansions, segmental and tandem duplications, and polyploidy. Recently, high-throughput mapping and scaffolding strategies have further improved continuity. Together, these long-range technologies enable quality draft assemblies of complex genomes in a cost-effective and timely manner.Here, we present high quality genome assemblies of the model legume plant, Medicago truncatula (R108) using PacBio, Dovetail Chicago (hereafter, Dovetail) and BioNano technologies. To test these technologies for plant genome assembly, we generated five assemblies using all possible combinations and ordering of these three technologies in the R108 assembly. While the BioNano and Dovetail joins overlapped, they also showed complementary gains in continuity and join numbers. Both technologies spanned repetitive regions that PacBio alone was unable to bridge. Combining technologies, particularly Dovetail followed by BioNano, resulted in notable improvements compared to Dovetail or BioNano alone. A combination of PacBio, Dovetail, and BioNano was used to generate a high quality draft assembly of R108, a M. truncatula accession widely used in studies of functional genomics. As a test for the usefulness of the resulting genome sequence, the new R108 assembly was used to pinpoint breakpoints and characterize flanking sequence of a previously identified translocation between chromosomes 4 and 8, identifying more than 22.7 Mb of novel sequence not present in the earlier A17 reference assembly.Adding Dovetail followed by BioNano data yielded complementary improvements in continuity over the original PacBio assembly. This strategy proved efficient and cost-effective for developing a quality draft assembly compared to traditional reference assemblies.


July 7, 2019

Resolving multicopy duplications de novo using polyploid phasing

While the rise of single-molecule sequencing systems has enabled an unprecedented rise in the ability to assemble complex regions of the genome, long segmental duplications in the genome still remain a challenging frontier in assembly. Segmental duplications are at the same time both gene rich and prone to large structural rearrangements, making the resolution of their sequences important in medical and evolutionary studies. Duplicated sequences that are collapsed in mammalian de novo assemblies are rarely identical; after a sequence is duplicated, it begins to acquire paralog-specific variants. In this paper, we study the problem of resolving the variations in multicopy, long segmental duplications by developing and utilizing algorithms for polyploid phasing. We develop two algorithms: the first one is targeted at maximizing the likelihood of observing the reads given the underlying haplotypes using discrete matrix completion. The second algorithm is based on correlation clustering and exploits an assumption, which is often satisfied in these duplications, that each paralog has a sizable number of paralog-specific variants. We develop a detailed simulation methodology and demonstrate the superior performance of the proposed algorithms on an array of simulated datasets. We measure the likelihood score as well as reconstruction accuracy, i.e., what fraction of the reads are clustered correctly. In both the performance metrics, we find that our algorithms dominate existing algorithms on more than 93% of the datasets. While the discrete matrix completion performs better on likelihood score, the correlation-clustering algorithm performs better on reconstruction accuracy due to the stronger regularization inherent in the algorithm. We also show that our correlation-clustering algorithm can reconstruct on average 7.0 haplotypes in 10-copy duplication datasets whereas existing algorithms reconstruct less than one copy on average.


July 7, 2019

Extraction of high molecular weight DNA from fungal rust spores for long read sequencing.

Wheat rust fungi are complex organisms with a complete life cycle that involves two different host plants and five different spore types. During the asexual infection cycle on wheat, rusts produce massive amounts of dikaryotic urediniospores. These spores are dikaryotic (two nuclei) with each nucleus containing one haploid genome. This dikaryotic state is likely to contribute to their evolutionary success, making them some of the major wheat pathogens globally. Despite this, most published wheat rust genomes are highly fragmented and contain very little haplotype-specific sequence information. Current long-read sequencing technologies hold great promise to provide more contiguous and haplotype-phased genome assemblies. Long reads are able to span repetitive regions and phase structural differences between the haplomes. This increased genome resolution enables the identification of complex loci and the study of genome evolution beyond simple nucleotide polymorphisms. Long-read technologies require pure high molecular weight DNA as an input for sequencing. Here, we describe a DNA extraction protocol for rust spores that yields pure double-stranded DNA molecules with molecular weight of >50 kilo-base pairs (kbp). The isolated DNA is of sufficient purity for PacBio long-read sequencing, but may require additional purification for other sequencing technologies such as Nanopore and 10× Genomics.


July 7, 2019

Long-read sequencing offers path to more accurate drug metabolism profiles

In the complex drug discovery process, one of the looming questions for any new compound is how it will be metabolised in a human bodyWhi|e there are several methods for evaluating this, one of the most common involves CYP2D6,the enzyme encoded by the cytochrome P450—2D6 gene.This enzyme is involved in metabolising a quarter of all commonly used medications, making it an important target for ADME and pharmacogenomics studies. It is known to activate some drugs and to play a role in the deactivation or excretion of others.


July 7, 2019

Morphological and genetic analyses of the invasive forest pathogen Phytophthora austrocedri reveal two clonal lineages colonised Britain and Argentina from a common ancestral population.

Phytophthora austrocedri is causing widespread mortality of Austrocedrus chilensis in Argentina and Juniperus communis in Britain. The pathogen has also been isolated from J. horizontalis in Germany. Isolates from Britain, Argentina and Germany are homothallic with no clear differences in the dimensions of sporangia, oogonia or oospores. Argentinian and German isolates grew faster than British isolates across a range of media and had a higher temperature tolerance although most isolates regardless of origin grew best at 15°C and all isolates were killed at 25°C. Argentinian and British isolates caused lesions on both hosts when inoculated onto A. chilensis and J. communis; however the Argentinian isolate caused longer lesions on A. chilensis than on J. communis and vice versa for the British isolate. Genetic analyses of nuclear and mitochondrial loci showed that all British isolates are identical. Argentinian isolates and the German isolate are also identical but differ from the British isolates. Single nucleotide polymorphisms are shared between the British and Argentinian isolates. It is concluded that British isolates and Argentinian isolates conform to two distinct clonal lineages of P. austrocedri founded from the same as-yet unidentified source population. These lineages should be recognised and treated as separate risks by international plant health legislation.


July 7, 2019

Regulation of hetDNA length during mitotic double-strand break repair in yeast.

Heteroduplex DNA (hetDNA) is a key molecular intermediate during the repair of mitotic double-strand breaks by homologous recombination, but its relationship to 5′ end resection and/or 3′ end extension is poorly understood. In the current study, we examined how perturbations in these processes affect the hetDNA profile associated with repair of a defined double-strand break (DSB) by the synthesis-dependent strand-annealing (SDSA) pathway. Loss of either the Exo1 or Sgs1 long-range resection pathway significantly shortened hetDNA, suggesting that these pathways normally collaborate during DSB repair. In addition, altering the processivity or proofreading activity of DNA polymerase d shortened hetDNA length or reduced break-adjacent mismatch removal, respectively, demonstrating that this is the primary polymerase that extends both 3′ ends. Data are most consistent with the extent of DNA synthesis from the invading end being the primary determinant of hetDNA length during SDSA. Copyright © 2017 Elsevier Inc. All rights reserved.


July 7, 2019

Restriction-modification mediated barriers to exogenous DNA uptake and incorporation employed by Prevotella intermedia.

Prevotella intermedia, a major periodontal pathogen, is increasingly implicated in human respiratory tract and cystic fibrosis lung infections. Nevertheless, the specific mechanisms employed by this pathogen remain only partially characterized and poorly understood, largely due to its total lack of genetic accessibility. Here, using Single Molecule, Real-Time (SMRT) genome and methylome sequencing, bisulfite sequencing, in addition to cloning and restriction analysis, we define the specific genetic barriers to exogenous DNA present in two of the most widespread laboratory strains, P. intermedia ATCC 25611 and P. intermedia Strain 17. We identified and characterized multiple restriction-modification (R-M) systems, some of which are considerably divergent between the two strains. We propose that these R-M systems are the root cause of the P. intermedia transformation barrier. Additionally, we note the presence of conserved Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) systems in both strains, which could provide a further barrier to exogenous DNA uptake and incorporation. This work will provide a valuable resource during the development of a genetic system for P. intermedia, which will be required for fundamental investigation of this organism’s physiology, metabolism, and pathogenesis in human disease.


July 7, 2019

The Apostasia genome and the evolution of orchids.

Constituting approximately 10% of flowering plant species, orchids (Orchidaceae) display unique flower morphologies, possess an extraordinary diversity in lifestyle, and have successfully colonized almost every habitat on Earth. Here we report the draft genome sequence of Apostasia shenzhenica, a representative of one of two genera that form a sister lineage to the rest of the Orchidaceae, providing a reference for inferring the genome content and structure of the most recent common ancestor of all extant orchids and improving our understanding of their origins and evolution. In addition, we present transcriptome data for representatives of Vanilloideae, Cypripedioideae and Orchidoideae, and novel third-generation genome data for two species of Epidendroideae, covering all five orchid subfamilies. A. shenzhenica shows clear evidence of a whole-genome duplication, which is shared by all orchids and occurred shortly before their divergence. Comparisons between A. shenzhenica and other orchids and angiosperms also permitted the reconstruction of an ancestral orchid gene toolkit. We identify new gene families, gene family expansions and contractions, and changes within MADS-box gene classes, which control a diverse suite of developmental processes, during orchid evolution. This study sheds new light on the genetic mechanisms underpinning key orchid innovations, including the development of the labellum and gynostemium, pollinia, and seeds without endosperm, as well as the evolution of epiphytism; reveals relationships between the Orchidaceae subfamilies; and helps clarify the evolutionary history of orchids within the angiosperms.


July 7, 2019

Multiple hybrid de novo genome assembly of finger millet, an orphan allotetraploid crop.

Finger millet (Eleusine coracana (L.) Gaertn) is an important crop for food security because of its tolerance to drought, which is expected to be exacerbated by global climate changes. Nevertheless, it is often classified as an orphan/underutilized crop because of the paucity of scientific attention. Among several small millets, finger millet is considered as an excellent source of essential nutrient elements, such as iron and zinc; hence, it has potential as an alternate coarse cereal. However, high-quality genome sequence data of finger millet are currently not available. One of the major problems encountered in the genome assembly of this species was its polyploidy, which hampers genome assembly compared with a diploid genome. To overcome this problem, we sequenced its genome using diverse technologies with sufficient coverage and assembled it via a novel multiple hybrid assembly workflow that combines next-generation with single-molecule sequencing, followed by whole-genome optical mapping using the Bionano Irys® system. The total number of scaffolds was 1,897 with an N50 length?>2.6?Mb and detection of 96% of the universal single-copy orthologs. The majority of the homeologs were assembled separately. This indicates that the proposed workflow is applicable to the assembly of other allotetraploid genomes.© The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


July 7, 2019

Archetype JC polyomavirus prevails in a rare case of JC polyomavirus nephropathy and in stable renal transplant recipients with JC polyomavirus viruria.

JC polyomavirus (JCPyV) is reactivated in approximately 20% of renal transplant recipients and it may rarely cause JCPyV-associated nephropathy (JCPyVAN). Whereas progressive multifocal leukoencephalopathy of the brain is caused by rearranged neurotropic JCPyV, little is known about viral sequence variation in JCPyVAN due to the rarity of this condition.Using single-molecule real-time sequencing, characterization of full-length JCPyV genomes from urine and plasma of one JCPyVAN patient and twenty stable renal transplant recipients with JCPyV viruria was attempted. Sequence analysis of JCPyV strains was performed with the emphasis on the NCCR region, the major capsid protein gene VP1 and the large T antigen (LTag) gene.Exclusively archetype strains were identified in urine of the JCPyVAN patient. Full-length JCPyV sequences were not retrieved from plasma. Archetype strains were found in urine of nineteen stable renal transplant recipients, with JCPyV quasispecies detected in five samples. In a patient with minor graft dysfunction, a strain with archetype-like NCCR region was discovered. Individual point mutations were detected in both VP1 and LTag genes.Archetype JCPyV was dominant in the JCPyVAN patient and in stable renal transplant recipients. Archetype rather than rearranged JCPyV seems to drive the pathogenesis of JCPyVAN.


July 7, 2019

Avoidance of APOBEC3B-induced mutation by error-free lesion bypass.

APOBEC cytidine deaminases mutate cancer genomes by converting cytidines into uridines within ssDNA during replication. Although uracil DNA glycosylases limit APOBEC-induced mutation, it is unknown if subsequent base excision repair (BER) steps function on replication-associated ssDNA. Hence, we measured APOBEC3B-induced CAN1 mutation frequencies in yeast deficient in BER endonucleases or DNA damage tolerance proteins. Strains lacking Apn1, Apn2, Ntg1, Ntg2 or Rev3 displayed wild-type frequencies of APOBEC3B-induced canavanine resistance (CanR). However, strains without error-free lesion bypass proteins Ubc13, Mms2 and Mph1 displayed respective 4.9-, 2.8- and 7.8-fold higher frequency of APOBEC3B-induced CanR. These results indicate that mutations resulting from APOBEC activity are avoided by deoxyuridine conversion to abasic sites ahead of nascent lagging strand DNA synthesis and subsequent bypass by error-free template switching. We found this mechanism also functions during telomere re-synthesis, but with a diminished requirement for Ubc13. Interestingly, reduction of G to C substitutions in Ubc13-deficient strains uncovered a previously unknown role of Ubc13 in controlling the activity of the translesion synthesis polymerase, Rev1. Our results highlight a novel mechanism for error-free bypass of deoxyuridines generated within ssDNA and suggest that the APOBEC mutation signature observed in cancer genomes may under-represent the genomic damage these enzymes induce.© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.


July 7, 2019

Dense and accurate whole-chromosome haplotyping of individual genomes.

The diploid nature of the human genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. This lack of haplotype-level analyses can be explained by a lack of methods that can produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single-cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. We provide comprehensive guidance on the required sequencing depths and reliably assign more than 95% of alleles (NA12878) to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different technologies represents an attractive solution to chart the genetic variation of diploid genomes.


July 7, 2019

Variant review with the Integrative Genomics Viewer.

Manual review of aligned reads for confirmation and interpretation of variant calls is an important step in many variant calling pipelines for next-generation sequencing (NGS) data. Visual inspection can greatly increase the confidence in calls, reduce the risk of false positives, and help characterize complex events. The Integrative Genomics Viewer (IGV) was one of the first tools to provide NGS data visualization, and it currently provides a rich set of tools for inspection, validation, and interpretation of NGS datasets, as well as other types of genomic data. Here, we present a short overview of IGV’s variant review features for both single-nucleotide variants and structural variants, with examples from both cancer and germline datasets. IGV is freely available at https://www.igv.org Cancer Res; 77(21); e31-34. ©2017 AACR.©2017 American Association for Cancer Research.


July 7, 2019

Hunting structural variants: Population by population

Until recently, most population-scale genome sequencing studies have focused on identifying single nucleotide variants (SNVs) to explore genetic differences between individuals. Like so many SNV-based genome-wide association studies, however, these efforts have had difficulty identifying causative genetic mechanisms underlying most complex functions. More and more, the genomics community has realised that structural variation is likely responsible for many of the traits and phenotypes that scientists have not been able to attribute to SNVs. This class of variants, defined as genetic differences of 50 bp or larger, accounts for most of the DNA sequence differences between any two people. Structural variants (SVs) are also already known to cause many common and rare diseases including ALS, schizophrenia, leukemia, Carney complex, and Huntington’s disease. Despite the importance of SVs, these larger variants have been understudied and underreported compared to their single-nucleotide counterparts. One reason is that they remain difficult to detect. Their length often means they cannot be fully spanned using short sequencing reads. They also often occur in highly repetitive or GC-rich regions of the genome, making them challenging targets. As such, this class of human genetic variation has remained vastly under-explored in global populations and is now ripe for discovery.


July 7, 2019

Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus).

The de novo assembly of repeat-rich mammalian genomes using only high-throughput short read sequencing data typically results in highly fragmented genome assemblies that limit downstream applications. Here, we present an iterative approach to hybrid de novo genome assembly that incorporates datasets stemming from multiple genomic technologies and methods. We used this approach to improve the gray mouse lemur (Microcebus murinus) genome from early draft status to a near chromosome-scale assembly.We used a combination of advanced genomic technologies to iteratively resolve conflicts and super-scaffold the M. murinus genome.We improved the M. murinus genome assembly to a scaffold N50 of 93.32 Mb. Whole genome alignments between our primary super-scaffolds and 23 human chromosomes revealed patterns that are congruent with historical comparative cytogenetic data, thus demonstrating the accuracy of our de novo scaffolding approach and allowing assignment of scaffolds to M. murinus chromosomes. Moreover, we utilized our independent datasets to discover and characterize sequences associated with centromeres across the mouse lemur genome. Quality assessment of the final assembly found 96% of mouse lemur canonical transcripts nearly complete, comparable to other published high-quality reference genome assemblies.We describe a new assembly of the gray mouse lemur (Microcebus murinus) genome with chromosome-scale scaffolds produced using a hybrid bioinformatic and sequencing approach. The approach is cost effective and produces superior results based on metrics of contiguity and completeness. Our results show that emerging genomic technologies can be used in combination to characterize centromeres of non-model species and to produce accurate de novo chromosome-scale genome assemblies of complex mammalian genomes.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.