Single Molecule, Real-Time (SMRT) Sequencing holds promise for addressing new frontiers to understand molecular mechanisms in evolution and gain insight into adaptive strategies. With read lengths exceeding 10 kb, we are able to sequence high-quality, closed microbial genomes with associated plasmids, and investigate large genome complexities, such as long, highly repetitive, low-complexity regions and multiple tandem-duplication events. Improved genome quality, observed at 99.9999% (QV60) consensus accuracy, and significant reduction of gap regions in reference genomes (up to and beyond 50%) allow researchers to better understand coding sequences with high confidence, investigate potential regulatory mechanisms in noncoding regions, and make inferences about evolutionary strategies that are otherwise missed by the coverage biases associated with short- read sequencing technologies. Additional benefits afforded by SMRT Sequencing include the simultaneous capability to detect epigenomic modifications and obtain full-length cDNA transcripts that obsolete the need for assembly. With direct sequencing of DNA in real-time, this has resulted in the identification of numerous base modifications and motifs, which genome-wide profiles have linked to specific methyltransferase activities. Our new offering, the Iso-Seq Application, allows for the accurate differentiation between transcript isoforms that are difficult to resolve with short-read technologies. PacBio reads easily span transcripts such that both 5’/3’ primers for cDNA library generation and the poly-A tail are observed. As such, exon configuration and intron retention events can be analyzed without ambiguity. This technological advance is useful for characterizing transcript diversity and improving gene structure annotations in reference genomes. We review solutions available with SMRT Sequencing, from targeted sequencing efforts to obtaining reference genomes (>100 Mb). This includes strategies for identifying microsatellites and conducting phylogenetic comparisons with targeted gene families. We highlight how to best leverage our long reads that have exceeded 20 kb in length for research investigations, as well as currently available bioinformatics strategies for analysis. Benefits for these applications are further realized with consistent use of size selection of input sample using the BluePippin™ device from Sage Science as demonstrated in our genome improvement projects. Using the latest P5-C3 chemistry on model organisms, these efforts have yielded an observed contig N50 of ~6 Mb, with the longest contig exceeding 12.5 Mb and an average base quality of QV50.
In this webinar, Emily Hatas of PacBio shares information about the applications and benefits of SMRT Sequencing in plant and animal biology, agriculture, and industrial research fields. This session contains…
User Group Meeting: Unbiased characterization of metagenome composition and function using HiFi sequencing on the PacBio Sequel II System
In this PacBio User Group Meeting presentation, PacBio scientist Meredith Ashby shared several examples of analysis — from full-length 16S sequencing to shotgun sequencing — showing how SMRT Sequencing enables…
Understanding interactions among plants and the complex communities of organisms living on, in and around them requires more than one experimental approach. A new method for de novo metagenome assembly,…
In this webinar, Kristin Mars, Sequencing Specialist, PacBio, presents an introduction to PacBio’s technology and its applications followed by a panel discussion among sequencing experts. The panel discussion addresses such…
Genome-wide association studies (GWAS) have identified many genomic loci associated with risk for schizophrenia, but unambiguous identification of the relationship between disease-associated variants and specific genes, and in particular their effect on risk conferring transcripts, has proven difficult. To better understand the specific molecular mechanism(s) at the schizophrenia locus in 11q25, we undertook cis expression quantitative trait loci (cis-eQTL) mapping for this 2 megabase genomic region using postmortem human brain samples. To comprehensively assess the effects of genetic risk upon local expression, we evaluated multiple transcript features: genes, exons, and exon-exon junctions in multiple brain regions-dorsolateral prefrontal cortex (DLPFC), hippocampus, and caudate. Genetic risk variants strongly associated with expression of SNX19 transcript features that tag multiple rare classes of SNX19 transcripts, whereas they only weakly affected expression of an exon-exon junction that tags the majority of abundant transcripts. The most prominent class of SNX19 risk-associated transcripts is predicted to be overexpressed, defined by an exon-exon splice junction between exons 8 and 10 (junc8.10) and that is predicted to encode proteins that lack the characteristic nexin C terminal domain. Risk alleles were also associated with either increased or decreased expression of multiple additional classes of transcripts. With RACE, molecular cloning, and long read sequencing, we found a number of novel SNX19 transcripts that further define the set of potential etiological transcripts. We explored epigenetic regulation of SNX19 expression and found that DNA methylation at CpG sites near the primary transcription start site and within exon 2 partially mediate the effects of risk variants on risk-associated expression. ATAC sequencing revealed that some of the most strongly risk-associated SNPs are located within a region of open chromatin, suggesting a nearby regulatory element is involved. These findings indicate a potentially complex molecular etiology, in which risk alleles for schizophrenia generate epigenetic alterations and dysregulation of multiple classes of SNX19 transcripts.
In the past several years, single-molecule sequencing platforms, such as those by Pacific Biosciences and Oxford Nanopore Technologies, have become available to researchers and are currently being tested for clinical applications. They offer exceptionally long reads that permit direct sequencing through regions of the genome inaccessible or difficult to analyze by short-read platforms. This includes disease-causing long repetitive elements, extreme GC content regions, and complex gene loci. Similarly, these platforms enable structural variation characterization at previously unparalleled resolution and direct detection of epigenetic marks in native DNA. Here, we review how these technologies are opening up new clinical avenues that are being applied to pathogenic microorganisms and viruses, constitutional disorders, pharmacogenomics, cancer, and more.Copyright © 2018 Elsevier Ltd. All rights reserved.
The Populus shoot undergoes primary growth (longitudinal growth) followed by secondary growth (radial growth), which produces biomass that is an important source of energy worldwide. We adopted joint PacBio Iso-Seq and RNA-seq analysis to identify differentially expressed transcripts along a developmental gradient from the shoot apex to the fifth internode of Populus Nanlin895. We obtained 87 150 full-length transcripts, including 2081 new isoforms and 62 058 new alternatively spliced isoforms, most of which were produced by intron retention, that were used to update the Populus annotation. Among these novel isoforms, there are 1187 long non-coding RNAs and 356 fusion genes. Using this annotation, we found 15 838 differentially expressed transcripts along the shoot developmental gradient, of which 1216 were transcription factors (TFs). Only a few of these genes were reported previously. The differential expression of these TFs suggests that they may play important roles in primary and secondary growth. AP2, ARF, YABBY and GRF TFs are highly expressed in the apex, whereas NAC, bZIP, PLATZ and HSF TFs are likely to be important for secondary growth. Overall, our findings provide evidence that long-read sequencing can complement short-read sequencing for cataloguing and quantifying eukaryotic transcripts and increase our understanding of the vital and dynamic process of shoot development. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Prokaryotic DNA contains three types of methylation: N6-methyladenine, N4-methylcytosine and 5-methylcytosine. The lack of tools to analyse the frequency and distribution of methylated residues in bacterial genomes has prevented a full understanding of their functions. Now, advances in DNA sequencing technology, including single-molecule, real-time sequencing and nanopore-based sequencing, have provided new opportunities for systematic detection of all three forms of methylated DNA at a genome-wide scale and offer unprecedented opportunities for achieving a more complete understanding of bacterial epigenomes. Indeed, as the number of mapped bacterial methylomes approaches 2,000, increasing evidence supports roles for methylation in regulation of gene expression, virulence and pathogen-host interactions.
Infectious disease is both a major force of selection in nature and a prime cause of yield loss in agriculture. In plants, disease resistance is often conferred by nucleotide-binding leucine-rich repeat (NLR) proteins, intracellular immune receptors that recognize pathogen proteins and their effects on the host. Consistent with extensive balancing and positive selection, NLRs are encoded by one of the most variable gene families in plants, but the true extent of intraspecific NLR diversity has been unclear. Here, we define a nearly complete species-wide pan-NLRome in Arabidopsis thaliana based on sequence enrichment and long-read sequencing. The pan-NLRome largely saturates with approximately 40 well-chosen wild strains, with half of the pan-NLRome being present in most accessions. We chart NLR architectural diversity, identify new architectures, and quantify selective forces that act on specific NLRs and NLR domains. Our study provides a blueprint for defining pan-NLRomes.Copyright © 2019 The Author(s). Published by Elsevier Inc. All rights reserved.
The development of clustered regularly interspaced short-palindromic repeat (CRISPR)-Cas systems for genome editing has transformed the way life science research is conducted and holds enormous potential for the treatment of disease as well as for many aspects of biotech- nology. Here, I provide a personal perspective on the development of CRISPR-Cas9 for genome editing within the broader context of the field and discuss our work to discover novel Cas effectors and develop them into additional molecular tools. The initial demonstra- tion of Cas9-mediated genome editing launched the development of many other technologies, enabled new lines of biological inquiry, and motivated a deeper examination of natural CRISPR-Cas systems, including the discovery of new types of CRISPR-Cas systems. These new discoveries in turn spurred further technological developments. I review these exciting discoveries and technologies as well as provide an overview of the broad array of applications of these technologies in basic research and in the improvement of human health. It is clear that we are only just beginning to unravel the potential within microbial diversity, and it is quite likely that we will continue to discover other exciting phenomena, some of which it may be possible to repurpose as molecular technologies. The transformation of mysterious natural phenomena to powerful tools, however, takes a collective effort to discover, characterize, and engineer them, and it has been a privilege to join the numerous researchers who have contributed to this transformation of CRISPR-Cas systems.
SMRT sequencing reveals differential patterns of methylation in two O111:H- STEC isolates from a hemolytic uremic syndrome outbreak in Australia.
In 1995 a severe haemolytic-uremic syndrome (HUS) outbreak in Adelaide occurred. A recent genomic analysis of Shiga toxigenic Escherichia coli (STEC) O111:H- strains 95JB1 and 95NR1 from this outbreak found that the more virulent isolate, 95NR1, harboured two additional copies of the Shiga toxin 2 (Stx2) genes encoded within prophage regions. The structure of the Stx2-converting prophages could not be fully resolved using short-read sequence data alone and it was not clear if there were other genomic differences between 95JB1 and 95NR1. In this study we have used Pacific Biosciences (PacBio) single molecule real-time (SMRT) sequencing to characterise the genome and methylome of 95JB1 and 95NR1. We completely resolved the structure of all prophages including two, tandemly inserted, Stx2-converting prophages in 95NR1 that were absent from 95JB1. Furthermore we defined all insertion sequences and found an additional IS1203 element in the chromosome of 95JB1. Our analysis of the methylome of 95NR1 and 95JB1 identified hemi-methylation of a novel motif (5′-CTGCm6AG-3′) in more than 4000 sites in the 95NR1 genome. These sites were entirely unmethylated in the 95JB1 genome, and included at least 177 potential promoter regions that could contribute to regulatory differences between the strains. IS1203 mediated deactivation of a novel type IIG methyltransferase in 95JB1 is the likely cause of the observed differential patterns of methylation between 95NR1 and 95JB1. This study demonstrates the capability of PacBio SMRT sequencing to resolve complex prophage regions and reveal the genetic and epigenetic heterogeneity within a clonal population of bacteria.
Genome-wide analysis of DNA methylation patterns using single molecule real-time DNA sequencing has boosted the number of publicly available methylomes. However, there is a lack of tools coupling methylation patterns and the corresponding methyltransferase genes. Here we demonstrate a high-throughput method for coupling methyltransferases with their respective motifs, using automated cloning and analysing the methyltransferases in vectors carrying a strain-specific cassette containing all potential target sites. To validate the method, we analyse the genomes of the thermophile Moorella thermoacetica and the mesophile Acetobacterium woodii, two acetogenic bacteria having substantially modified genomes with 12 methylation motifs and a total of 23 methyltransferase genes. Using our method, we characterize the 23 methyltransferases, assign motifs to the respective enzymes and verify activity for 11 of the 12 motifs.
Long-read sequencing unveils IGH-DUX4 translocation into the silenced IGH allele in B-cell acute lymphoblastic leukemia.
[email protected] proto-oncogene translocation is a common oncogenic event in lymphoid lineage cancers such as B-ALL, lymphoma and multiple myeloma. Here, to investigate the interplay between [email protected] proto-oncogene translocation and IGH allelic exclusion, we perform long-read whole-genome and transcriptome sequencing along with epigenetic and 3D genome profiling of Nalm6, an IGH-DUX4 positive B-ALL cell line. We detect significant allelic imbalance on the wild-type over the IGH-DUX4 haplotype in expression and epigenetic data, showing IGH-DUX4 translocation occurs on the silenced IGH allele. In vitro, this reduces the oncogenic stress of DUX4 high-level expression. Moreover, patient samples of IGH-DUX4 B-ALL have similar expression profile and IGH breakpoints as Nalm6, suggesting a common mechanism to allow optimal dosage of non-toxic DUX4 expression.
The DNA base modification N6-methyladenine (m6A) is involved in many pathways related to the survival of bacteria and their interactions with hosts. Nanopore sequencing offers a new, portable method to detect base modifications. Here, we show that a neural network can improve m6A detection at trained sequence contexts compared to previously published methods using deviations between measured and expected current values as each adenine travels through a pore. The model, implemented as the mCaller software package, can be extended to detect known or confirm suspected methyltransferase target motifs based on predictions of methylation at untrained contexts. We use PacBio, Oxford Nanopore, methylated DNA immunoprecipitation sequencing (MeDIP-seq), and whole-genome bisulfite sequencing data to generate and orthogonally validate methylomes for eight microbial reference species. These well-characterized microbial references can serve as controls in the development and evaluation of future methods for the identification of base modifications from single-molecule sequencing data.