Menu
September 22, 2019

CompStor Novos: a low cost yet fast assembly-based variant calling for personal genomes

Application of assembly methods for personal genome analysis from next generation sequencing data has been limited by the requirement for an expensive supercomputer hardware or long computation times when using ordinary resources. We describe CompStor Novos, achieving supercomputer-class performance in de novo assembly computation time on standard server hardware, based on a tiered-memory algorithm. Run on commercial off-the-shelf servers, Novos assembly is more precise and 10-20 times faster than that of existing assembly algorithms. Furthermore, we integrated Novos into a variant calling pipeline and demonstrate that both compute times and precision of calling point variants and indels compare well with standard alignment-based pipelines. Additionally, assembly eliminates bias in the estimation of allele frequency for indels and naturally enables discovery of breakpoints for structural variants with base pair resolution. Thus, Novos bridges the gap between alignment-based and assembly-based genome analyses. Extension and adaption of its underlying algorithm will help quickly and fully harvest information in sequencing reads for personal genome reconstruction.


September 21, 2019

Mistranslation drives the evolution of robustness in TEM-1 ß-lactamase.

How biological systems such as proteins achieve robustness to ubiquitous perturbations is a fundamental biological question. Such perturbations include errors that introduce phenotypic mutations into nascent proteins during the translation of mRNA. These errors are remarkably frequent. They are also costly, because they reduce protein stability and help create toxic misfolded proteins. Adaptive evolution might reduce these costs of protein mistranslation by two principal mechanisms. The first increases the accuracy of translation via synonymous “high fidelity” codons at especially sensitive sites. The second increases the robustness of proteins to phenotypic errors via amino acids that increase protein stability. To study how these mechanisms are exploited by populations evolving in the laboratory, we evolved the antibiotic resistance gene TEM-1 in Escherichia coli hosts with either normal or high rates of mistranslation. We analyzed TEM-1 populations that evolved under relaxed and stringent selection for antibiotic resistance by single molecule real-time sequencing. Under relaxed selection, mistranslating populations reduce mistranslation costs by reducing TEM-1 expression. Under stringent selection, they efficiently purge destabilizing amino acid changes. More importantly, they accumulate stabilizing amino acid changes rather than synonymous changes that increase translational accuracy. In the large populations we study, and on short evolutionary timescales, the path of least resistance in TEM-1 evolution consists of reducing the consequences of translation errors rather than the errors themselves.


September 21, 2019

Chromulinavorax destructans, a pathogenic TM6 bacterium with an unusual replication strategy targeting protist mitochondrion

Most of the diversity of microbial life is not available in culture, and as such we lack even a fundamental understanding of the biological diversity of several branches on the tree of life. One branch that is highly underrepresented is the candidate phylum TM6, also known as the Dependentiae. Their biology is known only from reduced genomes recovered from metagenomes around the world and two isolates infecting amoebae, all suggest that they live highly host-associated lifestyles as parasites or symbionts. Chromulinavorax destructans is an isolate from the TM6/Dependentiae that infects and lyses the abundant heterotrophic flagellate, Spumella elongata. Chromulinavorax destructans is characterized by a high degree of reduction and specialization for infection, so much so it was discovered in a screen for giant viruses. Its 1.2 Mb genome shows no metabolic potential and C. destructans instead relies on extensive transporter system to import nutrients, and even energy in the form of ATP from the host. Accordingly, it replicates in a viral-like fashion, while extensively reorganizing and expanding the host mitochondrion. 44% of proteins contain signal sequences for secretion, which includes many proteins of unknown function as well as 98 copies of ankyrin-repeat domain proteins, known effectors of host modulation, suggesting the presence of an extensive host-manipulation apparatus.


September 21, 2019

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.


September 21, 2019

A Sequel to Sanger: amplicon sequencing that scales.

Although high-throughput sequencers (HTS) have largely displaced their Sanger counterparts, the short read lengths and high error rates of most platforms constrain their utility for amplicon sequencing. The present study tests the capacity of single molecule, real-time (SMRT) sequencing implemented on the SEQUEL platform to overcome these limitations, employing 658 bp amplicons of the mitochondrial cytochrome c oxidase I gene as a model system.By examining templates from more than 5000 species and 20,000 specimens, the performance of SMRT sequencing was tested with amplicons showing wide variation in GC composition and varied sequence attributes. SMRT and Sanger sequences were very similar, but SMRT sequencing provided more complete coverage, especially for amplicons with homopolymer tracts. Because it can characterize amplicon pools from 10,000 DNA extracts in a single run, the SEQUEL can reduce greatly reduce sequencing costs in comparison to first (Sanger) and second generation platforms (Illumina, Ion).SMRT analysis generates high-fidelity sequences from amplicons with varying GC content and is resilient to homopolymer tracts. Analytical costs are low, substantially less than those for first or second generation sequencers. When implemented on the SEQUEL platform, SMRT analysis enables massive amplicon characterization because each instrument can recover sequences from more than 5 million DNA extracts a year.


September 21, 2019

Detecting AGG interruptions in females with a FMR1 premutation by long-read Single-Molecule Sequencing: A 1 year clinical experience.

The fragile X syndrome arises from the FMR1 CGG expansion of a premutation (55-200 repeats) to a full mutation allele (>200 repeats) and is the most frequent cause of inherited X-linked intellectual disability. The risk for a premutation to expand to a full mutation allele depends on the repeat length and AGG triplets interrupting this repeat. In genetic counseling it is important to have information on both these parameters to provide an accurate risk estimate to women carrying a premutation allele and weighing up having children. For example, in case of a small risk a woman might opt for a natural pregnancy followed up by prenatal diagnosis while she might choose for preimplantation genetic diagnosis (PGD) if the risk is high. Unfortunately, the detection of AGG interruptions was previously hampered by technical difficulties complicating their use in diagnostics. Therefore we recently developed, validated and implemented a new methodology which uses long-read single-molecule sequencing to identify AGG interruptions in females with a FMR1 premutation. Here we report on the assets of AGG interruption detection by sequencing and the impact of implementing the assay on genetic counseling.


September 21, 2019

Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements.

CRISPR-Cas9 is poised to become the gene editing tool of choice in clinical contexts. Thus far, exploration of Cas9-induced genetic alterations has been limited to the immediate vicinity of the target site and distal off-target sequences, leading to the conclusion that CRISPR-Cas9 was reasonably specific. Here we report significant on-target mutagenesis, such as large deletions and more complex genomic rearrangements at the targeted sites in mouse embryonic stem cells, mouse hematopoietic progenitors and a human differentiated cell line. Using long-read sequencing and long-range PCR genotyping, we show that DNA breaks introduced by single-guide RNA/Cas9 frequently resolved into deletions extending over many kilobases. Furthermore, lesions distal to the cut site and crossover events were identified. The observed genomic damage in mitotically active cells caused by CRISPR-Cas9 editing may have pathogenic consequences.


July 19, 2019

In vivo generation of DNA sequence diversity for cellular barcoding.

Heterogeneity is a ubiquitous feature of biological systems. A complete understanding of such systems requires a method for uniquely identifying and tracking individual components and their interactions with each other. We have developed a novel method of uniquely tagging individual cells in vivo with a genetic ‘barcode’ that can be recovered by DNA sequencing. Our method is a two-component system comprised of a genetic barcode cassette whose fragments are shuffled by Rci, a site-specific DNA invertase. The system is highly scalable, with the potential to generate theoretical diversities in the billions. We demonstrate the feasibility of this technique in Escherichia coli. Currently, this method could be employed to track the dynamics of populations of microbes through various bottlenecks. Advances of this method should prove useful in tracking interactions of cells within a network, and/or heterogeneity within complex biological samples.© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.


July 19, 2019

Error correction and assembly complexity of single molecule sequencing reads.

Third generation single molecule sequencing technology is poised to revolutionize genomics by en- abling the sequencing of long, individual molecules of DNA and RNA. These technologies now routinely produce reads exceeding 5,000 basepairs, and can achieve reads as long as 50,000 basepairs. Here we evaluate the limits of single molecule sequencing by assessing the impact of long read sequencing in the assembly of the human genome and 25 other important genomes across the tree of life. From this, we develop a new data-driven model using support vector regression that can accurately predict assembly performance. We also present a novel hybrid error correction algorithm for long PacBio sequencing reads that uses pre-assembled Illumina sequences for the error correction. We apply it several prokaryotic and eukaryotic genomes, and show it can achieve near-perfect assemblies of small genomes (< 100Mbp) and substantially improved assemblies of larger ones. All source code and the assembly model are available open-source.


July 19, 2019

The utility of PacBio circular consensus sequencing for characterizing complex gene families in non-model organisms.

Molecular characterization of highly diverse gene families can be time consuming, expensive, and difficult, especially when considering the potential for relatively large numbers of paralogs and/or pseudogenes. Here we investigate the utility of Pacific Biosciences single molecule real-time (SMRT) circular consensus sequencing (CCS) as an alternative to traditional cloning and Sanger sequencing PCR amplicons for gene family characterization. We target vomeronasal gene receptors, one of the most diverse gene families in mammals, with the goal of better understanding intra-specific V1R diversity of the gray mouse lemur (Microcebus murinus). Our study compares intragenomic variation for two V1R subfamilies found in the mouse lemur. Specifically, we compare gene copy variation within and between two individuals of M. murinus as characterized by different methods for nucleotide sequencing. By including the same individual animal from which the M. murinus draft genome was derived, we are able to cross-validate gene copy estimates from Sanger sequencing versus CCS methods.We generated 34,088 high quality circular consensus sequences of two diverse V1R subfamilies (here referred to as V1RI and V1RIX) from two individuals of Microcebus murinus. Using a minimum threshold of 7× coverage, we recovered approximately 90% of V1RI sequences previously identified in the draft M. murinus genome (59% being identical at all nucleotide positions). When low coverage sequences were considered (i.e. < 7× coverage) 100% of V1RI sequences identified in the draft genome were recovered. At least 13 putatively novel V1R loci were also identified using CCS technology.Recent upgrades to the Pacific Biosciences RS instrument have improved the CCS technology and offer an alternative to traditional sequencing approaches. Our results suggest that the Microcebus murinus V1R repertoire has been underestimated in the draft genome. In addition to providing an improved understanding of V1R diversity in the mouse lemur, this study demonstrates the utility of CCS technology for characterizing complex regions of the genome. We anticipate that long-read sequencing technologies such as PacBio SMRT will allow for the assembly of multigene family clusters and serve to more accurately characterize patterns of gene copy variation in large gene families, thus revealing novel micro-evolutionary patterns within non-model organisms.


July 19, 2019

Amplification and thrifty single-molecule sequencing of recurrent somatic structural variations.

Deletion of tumor-suppressor genes as well as other genomic rearrangements pervade cancer genomes across numerous types of solid tumor and hematologic malignancies. However, even for a specific rearrangement, the breakpoints may vary between individuals, such as the recurrent CDKN2A deletion. Characterizing the exact breakpoints for structural variants (SVs) is useful for designating patient-specific tumor biomarkers. We propose AmBre (Amplification of Breakpoints), a method to target SV breakpoints occurring in samples composed of heterogeneous tumor and germline DNA. Additionally, AmBre validates SVs called by whole-exome/genome sequencing and hybridization arrays. AmBre involves a PCR-based approach to amplify the DNA segment containing an SV’s breakpoint and then confirms breakpoints using sequencing by Pacific Biosciences RS. To amplify breakpoints with PCR, primers tiling specified target regions are carefully selected with a simulated annealing algorithm to minimize off-target amplification and maximize efficiency at capturing all possible breakpoints within the target regions. To confirm correct amplification and obtain breakpoints, PCR amplicons are combined without barcoding and simultaneously long-read sequenced using a single SMRT cell. Our algorithm efficiently separates reads based on breakpoints. Each read group supporting the same breakpoint corresponds with an amplicon and a consensus amplicon sequence is called. AmBre was used to discover CDKN2A deletion breakpoints in cancer cell lines: A549, CEM, Detroit562, MOLT4, MCF7, and T98G. Also, we successfully assayed RUNX1-RUNX1T1 reciprocal translocations by finding both breakpoints in the Kasumi-1 cell line. AmBre successfully targets SVs where DNA harboring the breakpoints are present in 1:1000 mixtures.


July 19, 2019

Reconstructing complex regions of genomes using long-read sequencing technology.

Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.


July 19, 2019

Evolution of mosquito preference for humans linked to an odorant receptor.

Female mosquitoes are major vectors of human disease and the most dangerous are those that preferentially bite humans. A ‘domestic’ form of the mosquito Aedes aegypti has evolved to specialize in biting humans and is the main worldwide vector of dengue, yellow fever, and chikungunya viruses. The domestic form coexists with an ancestral, ‘forest’ form that prefers to bite non-human animals and is found along the coast of Kenya. We collected the two forms, established laboratory colonies, and document striking divergence in preference for human versus non-human animal odour. We further show that the evolution of preference for human odour in domestic mosquitoes is tightly linked to increases in the expression and ligand-sensitivity of the odorant receptor AaegOr4, which we found recognizes a compound present at high levels in human odour. Our results provide a rare example of a gene contributing to behavioural evolution and provide insight into how disease-vectoring mosquitoes came to specialize on humans.


July 19, 2019

PacBio-LITS: a large-insert targeted sequencing method for characterization of human disease-associated chromosomal structural variations.

Generation of long (>5 Kb) DNA sequencing reads provides an approach for interrogation of complex regions in the human genome. Currently, large-insert whole genome sequencing (WGS) technologies from Pacific Biosciences (PacBio) enable analysis of chromosomal structural variations (SVs), but the cost to achieve the required sequence coverage across the entire human genome is high.We developed a method (termed PacBio-LITS) that combines oligonucleotide-based DNA target-capture enrichment technologies with PacBio large-insert library preparation to facilitate SV studies at specific chromosomal regions. PacBio-LITS provides deep sequence coverage at the specified sites at substantially reduced cost compared with PacBio WGS. The efficacy of PacBio-LITS is illustrated by delineating the breakpoint junctions of low copy repeat (LCR)-associated complex structural rearrangements on chr17p11.2 in patients diagnosed with Potocki-Lupski syndrome (PTLS; MIM#610883). We successfully identified previously determined breakpoint junctions in three PTLS cases, and also were able to discover novel junctions in repetitive sequences, including LCR-mediated breakpoints. The new information has enabled us to propose mechanisms for formation of these structural variants.The new method leverages the cost efficiency of targeted capture-sequencing as well as the mappability and scaffolding capabilities of long sequencing reads generated by the PacBio platform. It is therefore suitable for studying complex SVs, especially those involving LCRs, inversions, and the generation of chimeric Alu elements at the breakpoints. Other genomic research applications, such as haplotype phasing and small insertion and deletion validation could also benefit from this technology.


July 19, 2019

Intrahost dynamics of antiviral resistance in influenza a virus reflect complex patterns of segment linkage, reassortment, and natural selection.

Resistance following antiviral therapy is commonly observed in human influenza viruses. Although this evolutionary process is initiated within individual hosts, little is known about the pattern, dynamics, and drivers of antiviral resistance at this scale, including the role played by reassortment. In addition, the short duration of human influenza virus infections limits the available time window in which to examine intrahost evolution. Using single-molecule sequencing, we mapped, in detail, the mutational spectrum of an H3N2 influenza A virus population sampled from an immunocompromised patient who shed virus over a 21-month period. In this unique natural experiment, we were able to document the complex dynamics underlying the evolution of antiviral resistance. Individual resistance mutations appeared weeks before they became dominant, evolved independently on cocirculating lineages, led to a genome-wide reduction in genetic diversity through a selective sweep, and were placed into new combinations by reassortment. Notably, despite frequent reassortment, phylogenetic analysis also provided evidence for specific patterns of segment linkage, with a strong association between the hemagglutinin (HA)- and matrix (M)-encoding segments that matches that previously observed at the epidemiological scale. In sum, we were able to reveal, for the first time, the complex interaction between multiple evolutionary processes as they occur within an individual host.Understanding the evolutionary forces that shape the genetic diversity of influenza virus is crucial for predicting the emergence of drug-resistant strains but remains challenging because multiple processes occur concurrently. We characterized the evolution of antiviral resistance in a single persistent influenza virus infection, representing the first case in which reassortment and the complex patterns of drug resistance emergence and evolution have been determined within an individual host. Deep-sequence data from multiple time points revealed that the evolution of antiviral resistance reflects a combination of frequent mutation, natural selection, and a complex pattern of segment linkage and reassortment. In sum, these data show how immunocompromised hosts may help reveal the drivers of strain emergence. Copyright © 2015 Rogers et al.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.