June 1, 2021  |  

Sequencing complex mixtures of HIV-1 genomes with single-base resolution.

A large number of distinct HIV-1 genomes can be present in a single clinical sample from a patient chronically infected with HIV-1. We examined samples containing complex mixtures of near-full-length HIV-1 genomes. Single molecules were sequenced as near-full-length (9.6 kb) amplicons directly from PCR products without shearing. Mathematical analysis techniques deconvolved the complex mixture of reads into estimates of distinct near-full-length viral genomes with their relative abundances. We correctly estimated the originating genomes to single-base resolution along with their relative abundances for mixtures where the truth was known exactly by independent sequencing methods. Correct estimates were made even when genomes diverged by a single base. Minor abundances of 5% were reliably detected. SMRT Sequencing data contained near-full-length continuous reads for each sample including some runs with greater than 10,000 near-full-length-genome reads in a three-hour collection time. SMRT Sequencing yields long- read sequencing results from individual DNA molecules with a rapid time-to-result. The single-molecule, full-length nature of the sequencing method allows us to estimate variant subspecies and relative abundances even from samples containing complex mixtures of genomes that differ by single bases. These results open the possibility of cost-effective full-genome sequencing of HIV-1 in mixed populations for applications such as incorporated-HIV-1 screening. In screening, genomes can differ by one to many thousands of bases and the ability to measure them can help scientifically inform treatment strategies.


June 1, 2021  |  

High-accuracy, single-base resolution of near-full-length HIV genomes.

Background: The HIV-1 proviral reservoir is incredibly stable, even while undergoing antiretroviral therapy, and is seen as the major barrier to HIV-1 eradication. Identifying and comprehensively characterizing this reservoir will be critical to achieving an HIV cure. Historically, this has been a tedious and labor intensive process, requiring high-replicate single-genome amplification reactions, or overlapping amplicons that are then reconstructed into full-length genomes by algorithmic imputation. Here, we present a deep sequencing and analysis method able to determine the exact identity and relative abundances of near-full-length HIV genomes from samples containing mixtures of genomes without shearing or complex bioinformatic reconstruction. Methods: We generated clonal near-full-length (~9 kb) amplicons derived from single genome amplification (SGA) of primary proviral isolates or PCR of well-documented control strains. These clonal products were mixed at various abundances and sequenced as near-full-length (~9 kb) amplicons without shearing. Each mixture yielded many near-full-length HIV-1 reads. Mathematical analysis techniques resolved the complex mixture of reads into estimates of distinct near-full-length viral genomes with their relative abundances. Results: Single Molecule, Real-Time (SMRT) Sequencing data contained near-full-length (~9 kb) continuous reads for each sample including some runs with greater than 10,000 near-full-length-genome reads in a three-hour sequencing run. Our methods correctly recapitulated exactly the originating genomes at a single-base resolution and their relative abundances in both mixtures of clonal controls and SGAs, and these results were validated using independent sequencing methods. Correct resolution was achieved even when genomes differed only by a single base. Minor abundances of 5% were reliably detected. Conclusions: SMRT Sequencing yields long-read sequencing results from individual DNA molecules, a rapid time-to-result. The single-molecule, full-length nature of this sequencing method allows us to estimate variant subspecies and relative abundances with single-nucleotide resolution. This method allows for reference-agnostic and cost-effective full-genome sequencing of HIV-1, which could both further our understanding of latent infection and develop novel and improved tools for quantifying HIV provirus, which will be critical to cure HIV.


June 1, 2021  |  

SMRT-Cappable-seq reveals the complex operome of bacteria

SMRT-Cappable-seq combines the isolation of full-length prokaryotic primary transcripts with long read sequencing technology. It is the first experimental methodology to sequence entire prokaryotic transcripts. It identifies the transcription start site and termination site, thereby directly defines the operon structures genome-wide in prokaryotes. Applied to E.coli, SMRT-Cappable-seq identifies a total of ~2300 operons, among which ~900 are novel. Importantly, our result reveals a pervasive read-through of previous experimentally validated transcription termination sites. Termination read-through represents a powerful strategy to control gene expression. Taken together this data provides a first glance at the complexity of the ‘operome’ in bacteria and presents an invaluable resource for understanding gene regulation and function in bacteria.


June 1, 2021  |  

Full-length cDNA sequencing of prokaryotic transcriptome and metatranscriptome samples

Next-generation sequencing has become a useful tool for studying transcriptomes. However, these methods typically rely on sequencing short fragments of cDNA, then attempting to assemble the pieces into full-length transcripts. Here, we describe a method that uses PacBio long reads to sequence full-length cDNAs from individual transcriptomes and metatranscriptome samples. We have adapted the PacBio Iso-Seq protocol for use with prokaryotic samples by incorporating RNA polyadenylation and rRNA-depletion steps. In conjunction with SMRT Sequencing, which has average readlengths of 10-15 kb, we are able to sequence entire transcripts, including polycistronic RNAs, in a single read. Here, we show full-length bacterial transcriptomes with the ability to visualize transcription of operons. In the area of metatranscriptomics, long reads reveal unambiguous gene sequences without the need for post-sequencing transcript assembly. We also show full-length bacterial transcripts sequenced after being treated with NEB’s Cappable-Seq, which is an alternative method for depleting rRNA and enriching for full-length transcripts with intact 5’ ends. Combining Cappable-Seq with PacBio long reads allows for the detection of transcription start sites, with the additional benefit of sequencing entire transcripts.


April 21, 2020  |  

RNA sequencing: the teenage years.

Over the past decade, RNA sequencing (RNA-seq) has become an indispensable tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs. However, as next-generation sequencing technologies have developed, so too has RNA-seq. Now, RNA-seq methods are available for studying many different aspects of RNA biology, including single-cell gene expression, translation (the translatome) and RNA structure (the structurome). Exciting new applications are being explored, such as spatial transcriptomics (spatialomics). Together with new long-read and direct RNA-seq technologies and better computational tools for data analysis, innovations in RNA-seq are contributing to a fuller understanding of RNA biology, from questions such as when and where transcription occurs to the folding and intermolecular interactions that govern RNA function.


April 21, 2020  |  

Dnase1l3 deletion causes aberrations in length and end-motif frequencies in plasma DNA.

Circulating DNA in plasma consists of short DNA fragments. The biological processes generating such fragments are not well understood. DNASE1L3 is a secreted DNASE1-like nuclease capable of digesting DNA in chromatin, and its absence causes anti-DNA responses and autoimmunity in humans and mice. We found that the deletion of Dnase1l3 in mice resulted in aberrations in the fragmentation of plasma DNA. Such aberrations included an increase in short DNA molecules below 120 bp, which was positively correlated with anti-DNA antibody levels. We also observed an increase in long, multinucleosomal DNA molecules and decreased frequencies of the most common end motifs found in plasma DNA. These aberrations were independent of anti-DNA response, suggesting that they represented a primary effect of DNASE1L3 loss. Pregnant Dnase1l3-/- mice carrying Dnase1l3+/- fetuses showed a partial restoration of normal frequencies of plasma DNA end motifs, suggesting that DNASE1L3 from Dnase1l3-proficient fetuses could enter maternal systemic circulation and affect both fetal and maternal DNA fragmentation in a systemic as well as local manner. However, the observed shortening of circulating fetal DNA relative to maternal DNA was not affected by the deletion of Dnase1l3 Collectively, our findings demonstrate that DNASE1L3 plays a role in circulating plasma DNA homeostasis by enhancing fragmentation and influencing end-motif frequencies. These results support a distinct role of DNASE1L3 as a regulator of the physical form and availability of cell-free DNA and may have important implications for the mechanism whereby this enzyme prevents autoimmunity. Copyright © 2019 the Author(s). Published by PNAS.


April 21, 2020  |  

Genetic map-guided genome assembly reveals a virulence-governing minichromosome in the lentil anthracnose pathogen Colletotrichum lentis.

Colletotrichum lentis causes anthracnose, which is a serious disease on lentil and can account for up to 70% crop loss. Two pathogenic races, 0 and 1, have been described in the C. lentis population from lentil. To unravel the genetic control of virulence, an isolate of the virulent race 0 was sequenced at 1481-fold genomic coverage. The 56.10-Mb genome assembly consists of 50 scaffolds with N50 scaffold length of 4.89 Mb. A total of 11 436 protein-coding gene models was predicted in the genome with 237 coding candidate effectors, 43 secondary metabolite biosynthetic enzymes and 229 carbohydrate-active enzymes (CAZymes), suggesting a contraction of the virulence gene repertoire in C. lentis. Scaffolds were assigned to 10 core and two minichromosomes using a population (race 0 × race 1, n = 94 progeny isolates) sequencing-based, high-density (14 312 single nucleotide polymorphisms) genetic map. Composite interval mapping revealed a single quantitative trait locus (QTL), qClVIR-11, located on minichromosome 11, explaining 85% of the variability in virulence of the C. lentis population. The QTL covers a physical distance of 0.84 Mb with 98 genes, including seven candidate effector and two secondary metabolite genes. Taken together, the study provides genetic and physical evidence for the existence of a minichromosome controlling the C. lentis virulence on lentil. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.


April 21, 2020  |  

Computational aspects underlying genome to phenome analysis in plants.

Recent advances in genomics technologies have greatly accelerated the progress in both fundamental plant science and applied breeding research. Concurrently, high-throughput plant phenotyping is becoming widely adopted in the plant community, promising to alleviate the phenotypic bottleneck. While these technological breakthroughs are significantly accelerating quantitative trait locus (QTL) and causal gene identification, challenges to enable even more sophisticated analyses remain. In particular, care needs to be taken to standardize, describe and conduct experiments robustly while relying on plant physiology expertise. In this article, we review the state of the art regarding genome assembly and the future potential of pangenomics in plant research. We also describe the necessity of standardizing and describing phenotypic studies using the Minimum Information About a Plant Phenotyping Experiment (MIAPPE) standard to enable the reuse and integration of phenotypic data. In addition, we show how deep phenotypic data might yield novel trait-trait correlations and review how to link phenotypic data to genomic data. Finally, we provide perspectives on the golden future of machine learning and their potential in linking phenotypes to genomic features. © 2018 The Authors The Plant Journal published by John Wiley & Sons Ltd and Society for Experimental Biology.


April 21, 2020  |  

DNA methylation analysis.

DNA methylation is a process by which methyl groups are added to cytosine or adenine. DNA methylation can change the activity of the DNA molecule without changing the sequence. Methylation of 5-methylcytosine (5mC) is widespread in both eukaryotes and prokaryotes, and it is a very important epigenetic modification event, which can regulate gene activity and influence a number of key processes such as genomic imprinting, cell differentiation, transcriptional regulation, and chromatin remodeling. Profiling DNA methylation across the genome is critical to understanding the influence of methylation in normal biology and diseases including cancer. Recent discoveries of 5-methylcytosine (5mC) oxidation derivatives including 5-hydroxymethylcytosine (5hmC), 5-formylcytsine (5fC), and 5-carboxycytosine (5caC) in mammalian genome further expand our understanding of the methylation regulation. Genome-wide analyses such as microarrays and next-generation sequencing technologies have been used to assess large fractions of the methylome. A number of different quantitative approaches have also been established to map the DNA epigenomes with single-base resolution, as represented by the bisulfite-based methods, such as classical bisulfite sequencing, pyrosequencing etc. These methods have been used to generate base-resolution maps of 5mC and its oxidation derivatives in genomic samples. The focus of this chapter is to provide the methodologies that have been developed to detect the cytosine derivatives in the genomic DNA.


April 21, 2020  |  

Deciphering bacterial epigenomes using modern sequencing technologies.

Prokaryotic DNA contains three types of methylation: N6-methyladenine, N4-methylcytosine and 5-methylcytosine. The lack of tools to analyse the frequency and distribution of methylated residues in bacterial genomes has prevented a full understanding of their functions. Now, advances in DNA sequencing technology, including single-molecule, real-time sequencing and nanopore-based sequencing, have provided new opportunities for systematic detection of all three forms of methylated DNA at a genome-wide scale and offer unprecedented opportunities for achieving a more complete understanding of bacterial epigenomes. Indeed, as the number of mapped bacterial methylomes approaches 2,000, increasing evidence supports roles for methylation in regulation of gene expression, virulence and pathogen-host interactions.


April 21, 2020  |  

Single-molecule sequencing detection of N6-methyladenine in microbial reference materials.

The DNA base modification N6-methyladenine (m6A) is involved in many pathways related to the survival of bacteria and their interactions with hosts. Nanopore sequencing offers a new, portable method to detect base modifications. Here, we show that a neural network can improve m6A detection at trained sequence contexts compared to previously published methods using deviations between measured and expected current values as each adenine travels through a pore. The model, implemented as the mCaller software package, can be extended to detect known or confirm suspected methyltransferase target motifs based on predictions of methylation at untrained contexts. We use PacBio, Oxford Nanopore, methylated DNA immunoprecipitation sequencing (MeDIP-seq), and whole-genome bisulfite sequencing data to generate and orthogonally validate methylomes for eight microbial reference species. These well-characterized microbial references can serve as controls in the development and evaluation of future methods for the identification of base modifications from single-molecule sequencing data.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.