Single-Molecule Real-Time (SMRT) DNA sequencing is unique in that nucleotide incorporation events are monitored in real time, leading to a wealth of kinetic information in addition to the extraction of the primary DNA sequence. The dynamics of the DNA polymerase that is observed adds an additional dimension of sequence-dependent information, and can be used to learn more about the molecule under study. First, the primary sequence itself can be determined more accurately. The kinetic data can be used to corroborate or overturn consensus calls and even enable calling bases in problematic sequence contexts. Second, using the kinetic information, we can detect and discriminate numerous chemical base modifications as a by-product of ordinary sequencing. Examples of applying these capabilities include (i) the characterization of the epigenome of microorganisms by directly sequencing the three common prokaryotic epigenetic base modifications of 4-methylcytosine, 5- methylcytosine and 6-methyladenine; (ii) the characterization of known and novel methyltransferase activities; (iii) the direct sequencing and differentiation of the four eukaryotic epigenetic forms of cytosine (5-methyl, 5-hydroxymethyl, 5-formyl, and 5-carboxylcytosine) with first applications to map them with single base-pair and DNA strand resolution across mammalian genomes; (iv) the direct sequencing and identification of numerous modified DNA bases arising from DNA damage; and (v) an exploration of the mitochondrial genome for known and novel base modifications. We will show our progress towards a generic, open-source algorithm for exploiting kinetic information for any of these purposes.
Single Molecule, Real-Time Sequencing for base modification detection in eukaryotic organisms: Coprinopsis cinerea.
Single Molecule Real-Time (SMRT) DNA sequencing provides a wealth of kinetic information beyond the extraction of the primary DNA sequence, and this kinetic information can provide for the direct detection of modified bases present in genomic DNA. This method has been demonstrated for base modification detection in prokaryotes at base and strand resolutions. In eukaryotes, the common base modifications known to exist are the cytosine variants including methyl, hydroxymethyl, formyl and carboxyl forms. Each of these modifications exhibits different signatures in SMRT kinetic data, allowing for unprecedented possibilities to differentiate between them in direct sequencing data. We present early results of directly sequencing different base modifications in eukaryotic genomic DNA using this method.
Mitochondrial DNA (mtDNA) is a compact, double-stranded circular genome of 16,569 bp with a cytosine-rich light (L) chain and a guanine-rich heavy (H) chain. mtDNA mutations have been increasingly recognized as important contributors to an array of human diseases such as Parkinson’s disease, Alzheimer’s disease, colorectal cancer and Kearns–Sayre syndrome. mtDNA mutations can affect all of the 1000-10,000 copies of the mitochondrial genome present in a cell (homoplasmic mutation) or only a subset of copies (heteroplasmic mutation). The ratio of normal to mutant mtDNAs within cells is a significant factor in whether mutations will result in disease, as well as the clinical presentation, penetrance, and severity of the phenotype. Over time, heteroplasmic mutations can become homoplastic due to differential replication and random assortment. Full characterization of the mitochondrial genome would involve detection of not only homoplastic but heteroplasmic mutations, as well as complete phasing. Previously, we sequenced human mtDNA on the PacBio RS II System with two partially overlapping amplicons. Here, we present amplification-free, full-length sequencing of linearized mtDNA using the Sequel System. Full-length sequencing allows variant phasing along the entire mitochondrial genome, identification of heteroplasmic variants, and detection of epigenetic modifications that are lost in amplicon-based methods.
Tutorial: Base modification detection, base modification and motif analysis application [SMRT Link v5.0.0]
This tutorial provides an overview of the Base Modification and Motif analysis application for identifying common bacterial epigenetic modifications and analyzing methyltransferase recognition motifs. SMRT Analysis software supports epigenetic research…
New drugs with novel mechanisms of resistance are desperately needed to address both community and nosocomial infections due to Gram-negative bacteria. One such potential target is LpxC, an essential enzyme that catalyzes the first committed step of Lipid A biosynthesis. Achaogen conducted an extensive research campaign to discover novel LpxC inhibitors with activity against Pseudomonas aeruginosa We report here the in vitro antibacterial activity and pharmacodynamics of ACHN-975, the only molecule from these efforts and the first ever LpxC inhibitor to be evaluated in Phase 1 clinical trials. In addition, we describe the profile of three additional LpxC inhibitors that were identified as potential lead molecules. These efforts did not produce an additional development candidate with a sufficiently large therapeutic window and the program was subsequently terminated.Copyright © 2019 American Society for Microbiology.
Over the past decade, RNA sequencing (RNA-seq) has become an indispensable tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs. However, as next-generation sequencing technologies have developed, so too has RNA-seq. Now, RNA-seq methods are available for studying many different aspects of RNA biology, including single-cell gene expression, translation (the translatome) and RNA structure (the structurome). Exciting new applications are being explored, such as spatial transcriptomics (spatialomics). Together with new long-read and direct RNA-seq technologies and better computational tools for data analysis, innovations in RNA-seq are contributing to a fuller understanding of RNA biology, from questions such as when and where transcription occurs to the folding and intermolecular interactions that govern RNA function.
Complete genome sequence provides insights into the quorum sensing-related spoilage potential of Shewanella baltica 128 isolated from spoiled shrimp.
Shewanella baltica 128 is a specific spoilage organism (SSO) isolated from the refrigerated shrimp that results in shrimp spoilage. This study reported the complete genome sequencing of this strain, with the primary annotations associated with amino acid transport and metabolism (8.66%), indicating that S. baltica 128 has good potential for degrading proteins. In vitro experiments revealed Shewanella baltica 128 could adapt to the stress conditions by regulating its growth and biofilm formation. Genes that related to the spoilage-related metabolic pathways, including trimethylamine metabolism (torT), sulfur metabolism (cysM), putrescine metabolism (speC), biofilm formation (rpoS) and serine protease production (degS), were identified. Genes (LuxS, pfs, LuxR and qseC) that related to the specific QS system were also identified. Complete genome sequence of S. baltica 128 provide insights into the QS-related spoilage potential, which might provide novel information for the development of new approaches for spoilage detection and prevention based on QS target.Copyright © 2019. Published by Elsevier Inc.
In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a brief introduction to the basic concepts of genome assembly, annotation and metrics for evaluating the quality of draft assemblies. We then provide an overview of genome information for numerous insect species, highlighting examples from prominent model organisms, agricultural pests and disease vectors. We also introduce the major insect genome databases. The increasing availability of insect genomic resources is beneficial for developing alternative pest control methods. However, many opportunities remain for developing data-mining tools that make maximal use of the available insect genome resources. Although rapid progress has been achieved, many challenges remain in the field of insect genomics. © 2019 The Royal Entomological Society.
Full-length mRNA sequencing and gene expression profiling reveal broad involvement of natural antisense transcript gene pairs in pepper development and response to stresses.
Pepper is an important vegetable with great economic value and unique biological features. In the past few years, significant development has been made towards understanding the huge complex pepper genome; however, pepper functional genomics has not been well studied. To better understand the pepper gene structure and pepper gene regulation, we conducted full-length mRNA sequencing by PacBio sequencing and obtained 57862 high-quality full-length mRNA sequences derived from 18362 previously annotated and 5769 newly detected genes. New gene models were built that combined the full-length mRNA sequences and corrected approximately 500 fragmented gene models from previous annotations. Based on the full-length mRNA, we identified 4114 and 5880 pepper genes forming natural antisense transcript (NAT) genes in-cis and in-trans, respectively. Most of these genes accumulate small RNAs in their overlapping regions. By analyzing these NAT gene expression patterns in our transcriptome data, we identified many NAT pairs responsive to a variety of biological processes in pepper. Pepper formate dehydrogenase 1 (FDH1), which is required for R-gene-mediated disease resistance, may be regulated by nat-siRNAs and participate in a positive feedback loop in salicylic acid biosynthesis during resistance responses. Several cis-NAT pairs and subgroups of trans-NAT genes were responsive to pepper pericarp and placenta development, which may play roles in capsanthin and capsaicin biosynthesis. Using a comparative genomics approach, the evolutionary mechanisms of cis-NATs were investigated, and we found that an increase in intergenic sequences accounted for the loss of most cis-NATs, while transposon insertion contributed to the formation of most new cis-NATs. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.
Motivation: Third-generation sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used. Results: In this paper, we present a categorization and review of long read error correction methods, and provide a comprehensive evaluation of the corresponding long read error correction tools. Leveraging recent real sequencing data, we establish benchmark data sets and set up evaluation criteria for a comparative assessment which includes quality of error correction as well as run-time and memory usage. We study how trimming and long read sequencing depth affect error correction in terms of length distribution and genome coverage post-correction, and the impact of error correction performance on an important application of long reads, genome assembly. We provide guidelines for practitioners for choosing among the available error correction tools and identify directions for future research.
The landscape of SNCA transcripts across synucleinopathies: New insights from long reads sequencing analysis
Dysregulation of alpha-synuclein expression has been implicated in the pathogenesis of synucleinopathies, in particular Parkinsontextquoterights Disease (PD) and Dementia with Lewy bodies (DLB). Previous studies have shown that the alternatively spliced isoforms of the SNCA gene are differentially expressed in different parts of the brain for PD and DLB patients. Similarly, SNCA isoforms with skipped exons can have a functional impact on the protein domains. The large intronic region of the SNCA gene was also shown to harbor structural variants that affect transcriptional levels. Here we apply the first study of using long read sequencing with targeted capture of both the gDNA and cDNA of the SNCA gene in brain tissues of PD, DLB, and control samples using the PacBio Sequel system. The targeted full-length cDNA (Iso-Seq) data confirmed complex usage of known alternative start sites and variable 3textquoteright UTR lengths, as well as novel 5textquoteright starts and 3textquoteright ends not previously described. The targeted gDNA data allowed phasing of up to 81% of the ~114kb SNCA region, with the longest phased block excedding 54 kb. We demonstrate that long gDNA and cDNA reads have the potential to reveal long-range information not previously accessible using traditional sequencing methods. This approach has a potential impact in studying disease risk genes such as SNCA, providing new insights into the genetic etiologies, including perturbations to the landscape the gene transcripts, of human complex diseases such as synucleinopathies.
Genome sequence analysis of 91 Salmonella Enteritidis isolates from mice caught on poultry farms in the mid 1990s.
A total of 91 draft genome sequences were used to analyze isolates of Salmonella enterica serovar Enteritidis obtained from feral mice caught on poultry farms in Pennsylvania. One objective was to find mutations disrupting open reading frames (ORFs) and another was to determine if ORF-disruptive mutations were present in isolates obtained from other sources. A total of 83 mice were obtained between 1995-1998. Isolates separated into two genomic clades and 12 subgroups due to 742 mutations. Nineteen ORF-disruptive mutations were found, and in addition, bigA had exceptional heterogeneity requiring additional evaluation. The TRAMS algorithm detected only 6 ORF disruptions. The sefD mutation was the most frequently encountered mutation and it was prevalent in human, poultry, environmental and mouse isolates. These results confirm previous assessments of the mouse as a rich source of Salmonella enterica serovar Enteritidis that varies in genotype and phenotype. Copyright © 2019. Published by Elsevier Inc.
Methylomes of Two Extremely Halophilic Archaea Species, Haloarcula marismortui and Haloferax mediterranei.
The genomes of two extremely halophilic Archaea species, Haloarcula marismortui and Haloferax mediterranei, were sequenced using single-molecule real-time sequencing. The ~4-Mbp genomes are GC rich with multiple large plasmids and two 4-methyl-cytosine patterns. Methyl transferases were incorporated into the Restriction Enzymes Database (REBASE), and gene annotation was incorporated into the Haloarchaeal Genomes Database (HaloWeb).Copyright © 2019 DasSarma et al.
Relative Performance of MinION (Oxford Nanopore Technologies) versus Sequel (Pacific Biosciences) Third-Generation Sequencing Instruments in Identification of Agricultural and Forest Fungal Pathogens.
Culture-based molecular identification methods have revolutionized detection of pathogens, yet these methods are slow and may yield inconclusive results from environmental materials. The second-generation sequencing tools have much-improved precision and sensitivity of detection, but these analyses are costly and may take several days to months. Of the third-generation sequencing techniques, the portable MinION device (Oxford Nanopore Technologies) has received much attention because of its small size and possibility of rapid analysis at reasonable cost. Here, we compare the relative performances of two third-generation sequencing instruments, MinION and Sequel (Pacific Biosciences), in identification and diagnostics of fungal and oomycete pathogens from conifer (Pinaceae) needles and potato (Solanum tuberosum) leaves and tubers. We demonstrate that the Sequel instrument is efficient for metabarcoding of complex samples, whereas MinION is not suited for this purpose due to a high error rate and multiple biases. However, we find that MinION can be utilized for rapid and accurate identification of dominant pathogenic organisms and other associated organisms from plant tissues following both amplicon-based and PCR-free metagenomics approaches. Using the metagenomics approach with shortened DNA extraction and incubation times, we performed the entire MinION workflow, from sample preparation through DNA extraction, sequencing, bioinformatics, and interpretation, in 2.5 h. We advocate the use of MinION for rapid diagnostics of pathogens and potentially other organisms, but care needs to be taken to control or account for multiple potential technical biases.IMPORTANCE Microbial pathogens cause enormous losses to agriculture and forestry, but current combined culturing- and molecular identification-based detection methods are too slow for rapid identification and application of countermeasures. Here, we develop new and rapid protocols for Oxford Nanopore MinION-based third-generation diagnostics of plant pathogens that greatly improve the speed of diagnostics. However, due to high error rate and technical biases in MinION, the Pacific BioSciences Sequel platform is more useful for in-depth amplicon-based biodiversity monitoring (metabarcoding) from complex environmental samples.Copyright © 2019 American Society for Microbiology.
Genome sequence of the Chinese white wax scale insect Ericerus pela: the first draft genome for the Coccidae family of scale insects.
The Chinese white wax scale insect, Ericerus pela, is best known for producing wax, which has been widely used in candle production, casting, Chinese medicine, and wax printing products for thousands of years. The secretion of wax, and other unusual features of scale insects, is thought to be an adaptation to their change from an ancestral ground-dwelling lifestyle to a sedentary lifestyle on the higher parts of plants. As well as helping to improve its economic value, studies of E. pela might also help to explain the adaptation of scale insects. However, no genomic data are currently available for E. pela.To assemble the E. pela genome, 303.92 Gb of data were generated using Illumina and Pacific Biosciences sequencing, producing 277.22 Gb of clean data for assembly. The assembled genome size was 0.66 Gb, with 1,979 scaffolds and a scaffold N50 of 735 kb. The guanine + cytosine content was 33.80%. A total of 12,022 protein-coding genes were predicted, with a mean coding sequence length of 1,370 bp. Twenty-six fatty acyl-CoA reductase genes and 35 acyltransferase genes were identified. Evolutionary analysis revealed that E. pela and aphids formed a sister group and split ~241.1 million years ago. There were 214 expanded gene families and 2,219 contracted gene families in E. pela.We present the first genome sequence from the Coccidae family. These results will help to increase our understanding of the evolution of unique features in scale insects, and provide important genetic information for further research. © The Author(s) 2019. Published by Oxford University Press.