Menu
July 19, 2019

Unlocking the mystery of the hard-to-sequence phage genome: PaP1 methylome and bacterial immunity.

Whole-genome sequencing is an important method to understand the genetic information, gene function, biological characteristics and survival mechanisms of organisms. Sequencing large genomes is very simple at present. However, we encountered a hard-to-sequence genome of Pseudomonas aeruginosa phage PaP1. Shotgun sequencing method failed to complete the sequence of this genome.After persevering for 10 years and going over three generations of sequencing techniques, we successfully completed the sequence of the PaP1 genome with a length of 91,715 bp. Single-molecule real-time sequencing results revealed that this genome contains 51?N-6-methyladenines and 152?N-4-methylcytosines. Three significant modified sequence motifs were predicted, but not all of the sites found in the genome were methylated in these motifs. Further investigations revealed a novel immune mechanism of bacteria, in which host bacteria can recognise and repel modified bases containing inserts in a large scale. This mechanism could be accounted for the failure of the shotgun method in PaP1 genome sequencing. This problem was resolved using the nfi- mutant of Escherichia coli DH5a as a host bacterium to construct a shotgun library.This work provided insights into the hard-to-sequence phage PaP1 genome and discovered a new mechanism of bacterial immunity. The methylome of phage PaP1 is responsible for the failure of shotgun sequencing and for bacterial immunity mediated by enzyme Endo V activity; this methylome also provides a valuable resource for future studies on PaP1 genome replication and modification, as well as on gene regulation and host interaction.


July 19, 2019

Global methylation state at base-pair resolution of the Caulobacter genome throughout the cell cycle.

The Caulobacter DNA methyltransferase CcrM is one of five master cell-cycle regulators. CcrM is transiently present near the end of DNA replication when it rapidly methylates the adenine in hemimethylated GANTC sequences. The timing of transcription of two master regulator genes and two cell division genes is controlled by the methylation state of GANTC sites in their promoters. To explore the global extent of this regulatory mechanism, we determined the methylation state of the entire chromosome at every base pair at five time points in the cell cycle using single-molecule, real-time sequencing. The methylation state of 4,515 GANTC sites, preferentially positioned in intergenic regions, changed progressively from full to hemimethylation as the replication forks advanced. However, 27 GANTC sites remained unmethylated throughout the cell cycle, suggesting that these protected sites could participate in epigenetic regulatory functions. An analysis of the time of activation of every cell-cycle regulatory transcription start site, coupled to both the position of a GANTC site in their promoter regions and the time in the cell cycle when the GANTC site transitions from full to hemimethylation, allowed the identification of 59 genes as candidates for epigenetic regulation. In addition, we identified two previously unidentified N(6)-methyladenine motifs and showed that they maintained a constant methylation state throughout the cell cycle. The cognate methyltransferase was identified for one of these motifs as well as for one of two 5-methylcytosine motifs.


July 19, 2019

Reducing assembly complexity of microbial genomes with single-molecule sequencing.

The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem.To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads.Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.


July 19, 2019

Single-molecule sequencing to track plasmid diversity of hospital-associated carbapenemase-producing Enterobacteriaceae.

Public health officials have raised concerns that plasmid transfer between Enterobacteriaceae species may spread resistance to carbapenems, an antibiotic class of last resort, thereby rendering common health care-associated infections nearly impossible to treat. To determine the diversity of carbapenemase-encoding plasmids and assess their mobility among bacterial species, we performed comprehensive surveillance and genomic sequencing of carbapenem-resistant Enterobacteriaceae in the National Institutes of Health (NIH) Clinical Center patient population and hospital environment. We isolated a repertoire of carbapenemase-encoding Enterobacteriaceae, including multiple strains of Klebsiella pneumoniae, Klebsiella oxytoca, Escherichia coli, Enterobacter cloacae, Citrobacter freundii, and Pantoea species. Long-read genome sequencing with full end-to-end assembly revealed that these organisms carry the carbapenem resistance genes on a wide array of plasmids. K. pneumoniae and E. cloacae isolated simultaneously from a single patient harbored two different carbapenemase-encoding plasmids, indicating that plasmid transfer between organisms was unlikely within this patient. We did, however, find evidence of horizontal transfer of carbapenemase-encoding plasmids between K. pneumoniae, E. cloacae, and C. freundii in the hospital environment. Our data, including full plasmid identification, challenge assumptions about horizontal gene transfer events within patients and identify possible connections between patients and the hospital environment. In addition, we identified a new carbapenemase-encoding plasmid of potentially high clinical impact carried by K. pneumoniae, E. coli, E. cloacae, and Pantoea species, in unrelated patients and in the hospital environment. Copyright © 2014, American Association for the Advancement of Science.


July 19, 2019

Reconstructing complex regions of genomes using long-read sequencing technology.

Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.


July 19, 2019

qDNAmod: a statistical model-based tool to reveal intercellular heterogeneity of DNA modification from SMRT sequencing data.

In an isogenic cell population, phenotypic heterogeneity among individual cells is common and critical for survival of the population under different environment conditions. DNA modification is an important epigenetic factor that can regulate phenotypic heterogeneity. The single molecule real-time (SMRT) sequencing technology provides a unique platform for detecting a wide range of DNA modifications, including N6-methyladenine (6-mA), N4-methylcytosine (4-mC) and 5-methylcytosine (5-mC). Here we present qDNAmod, a novel bioinformatic tool for genome-wide quantitative profiling of intercellular heterogeneity of DNA modification from SMRT sequencing data. It is capable of estimating proportion of isogenic haploid cells, in which the same loci of the genome are differentially modified. We tested the reliability of qDNAmod with the SMRT sequencing data of Streptococcus pneumoniae strain ST556. qDNAmod detected extensive intercellular heterogeneity of DNA methylation (6-mA) in a clonal population of ST556. Subsequent biochemical analyses revealed that the recognition sequences of two type I restriction–modification (R-M) systems are responsible for the intercellular heterogeneity of DNA methylation initially identified by qDNAmod. qDNAmod thus represents a valuable tool for studying intercellular phenotypic heterogeneity from genome-wide DNA modification.


July 19, 2019

Going beyond five bases in DNA sequencing.

DNA sequencing has provided a wealth of information about biological systems, but thus far has focused on the four canonical bases, and 5-methylcytosine through comparison of the genomic DNA sequence to a transformed four-base sequence obtained after treatment with bisulfite. However, numerous other chemical modifications to the nucleotides are known to control fundamental life functions, influence virulence of pathogens, and are associated with many diseases. These modifications cannot be accessed with traditional sequencing methods. In this opinion, we highlight several emerging single-molecule sequencing techniques that have the potential to directly detect many types of DNA modifications as an integral part of the sequencing protocol. Copyright © 2012 Elsevier Ltd. All rights reserved.


July 19, 2019

A comparative analysis of methylome profiles of Campylobacter jejuni sheep abortion isolate and gastroenteric strains using PacBio data.

Campylobacter jejuni is a leading cause of human gastrointestinal disease and small ruminant abortions in the United States. The recent emergence of a highly virulent, tetracycline-resistant C. jejuni subsp. jejuni sheep abortion clone (clone SA) in the United States, and that strain’s association with human disease, has resulted in a heightened awareness of the zoonotic potential of this organism. Pacific Biosciences’ Single Molecule, Real-Time sequencing technology was used to explore the variation in the genome-wide methylation patterns of the abortifacient clone SA (IA3902) and phenotypically distinct gastrointestinal-specific C. jejuni strains (NCTC 11168 and 81-176). Several notable differences were discovered that distinguished the methylome of IA3902 from that of 11168 and 81-176: identification of motifs novel to IA3902, genome-specific hypo- and hypermethylated regions, strain level variability in genes methylated, and differences in the types of methylation motifs present in each strain. These observations suggest a possible role of methylation in the contrasting disease presentations of these three C. jejuni strains. In addition, the methylation profiles between IA3902 and a luxS mutant were explored to determine if variations in methylation patterns could be identified that might explain the role of LuxS-dependent methyl recycling in IA3902 abortifacient potential.


July 19, 2019

Long-read, whole-genome shotgun sequence data for five model organisms.

Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.


July 19, 2019

PacBio-LITS: a large-insert targeted sequencing method for characterization of human disease-associated chromosomal structural variations.

Generation of long (>5 Kb) DNA sequencing reads provides an approach for interrogation of complex regions in the human genome. Currently, large-insert whole genome sequencing (WGS) technologies from Pacific Biosciences (PacBio) enable analysis of chromosomal structural variations (SVs), but the cost to achieve the required sequence coverage across the entire human genome is high.We developed a method (termed PacBio-LITS) that combines oligonucleotide-based DNA target-capture enrichment technologies with PacBio large-insert library preparation to facilitate SV studies at specific chromosomal regions. PacBio-LITS provides deep sequence coverage at the specified sites at substantially reduced cost compared with PacBio WGS. The efficacy of PacBio-LITS is illustrated by delineating the breakpoint junctions of low copy repeat (LCR)-associated complex structural rearrangements on chr17p11.2 in patients diagnosed with Potocki-Lupski syndrome (PTLS; MIM#610883). We successfully identified previously determined breakpoint junctions in three PTLS cases, and also were able to discover novel junctions in repetitive sequences, including LCR-mediated breakpoints. The new information has enabled us to propose mechanisms for formation of these structural variants.The new method leverages the cost efficiency of targeted capture-sequencing as well as the mappability and scaffolding capabilities of long sequencing reads generated by the PacBio platform. It is therefore suitable for studying complex SVs, especially those involving LCRs, inversions, and the generation of chimeric Alu elements at the breakpoints. Other genomic research applications, such as haplotype phasing and small insertion and deletion validation could also benefit from this technology.


July 19, 2019

Targeted single molecule sequencing methodology for ovarian hyperstimulation syndrome.

One of the most significant issues surrounding next generation sequencing is the cost and the difficulty assembling short read lengths. Targeted capture enrichment of longer fragments using single molecule sequencing (SMS) is expected to improve both sequence assembly and base-call accuracy but, at present, there are very few examples of successful application of these technologic advances in translational research and clinical testing. We developed a targeted single molecule sequencing (T-SMS) panel for genes implicated in ovarian response to controlled ovarian hyperstimulation (COH) for infertility.Target enrichment was carried out using droplet-base multiplex polymerase chain reaction (PCR) technology (RainDance®) designed to yield amplicons averaging 1 kb fragment size from candidate 44 loci (99.8% unique base-pair coverage). The total targeted sequence was 3.18 Mb per sample. SMS was carried out using single molecule, real-time DNA sequencing (SMRT® Pacific Biosciences®), average raw read length?=?1178 nucleotides, 5% of the amplicons >6000 nucleotides). After filtering with circular consensus (CCS) reads, the mean read length was 3200 nucleotides (97% CCS accuracy). Primary data analyses, alignment and filtering utilized the Pacific Biosciences® SMRT portal. Secondary analysis was conducted using the Genome Analysis Toolkit for SNP discovery l and wANNOVAR for functional analysis of variants. Filtered functional variants 18 of 19 (94.7%) were further confirmed using conventional Sanger sequencing. CCS reads were able to accurately detect zygosity. Coverage within GC rich regions (i.e.VEGFR; 72% GC rich) was achieved by capturing long genomic DNA (gDNA) fragments and reading into regions that flank the capture regions. As proof of concept, a non-synonymous LHCGR variant captured in two severe OHSS cases, and verified by conventional sequencing.Combining emulsion PCR-generated 1 kb amplicons and SMRT DNA sequencing permitted greater depth of coverage for T-SMS and facilitated easier sequence assembly. To the best of our knowledge, this is the first report combining emulsion PCR and T-SMS for long reads using human DNA samples, and NGS panel designed for biomarker discovery in OHSS.


July 19, 2019

Quantitative and multiplexed DNA methylation analysis using long-read single-molecule real-time bisulfite sequencing (SMRT-BS).

DNA methylation has essential roles in transcriptional regulation, imprinting, X chromosome inactivation and other cellular processes, and aberrant CpG methylation is directly involved in the pathogenesis of human imprinting disorders and many cancers. To address the need for a quantitative and highly multiplexed bisulfite sequencing method with long read lengths for targeted CpG methylation analysis, we developed single-molecule real-time bisulfite sequencing (SMRT-BS).Optimized bisulfite conversion and PCR conditions enabled the amplification of DNA fragments up to ~1.5 kb, and subjecting overlapping 625-1491 bp amplicons to SMRT-BS indicated high reproducibility across all amplicon lengths (r?=?0.972) and low standard deviations (=0.10) between individual CpG sites sequenced in triplicate. Higher variability in CpG methylation quantitation was correlated with reduced sequencing depth, particularly for intermediately methylated regions. SMRT-BS was validated by orthogonal bisulfite-based microarray (r?=?0.906; 42 CpG sites) and second generation sequencing (r?=?0.933; 174 CpG sites); however, longer SMRT-BS amplicons (>1.0 kb) had reduced, but very acceptable, correlation with both orthogonal methods (r?=?0.836-0.897 and r?=?0.892-0.927, respectively) compared to amplicons less than ~1.0 kb (r?=?0.940-0.951 and r?=?0.948-0.963, respectively). Multiplexing utility was assessed by simultaneously subjecting four distinct CpG island amplicons (702-866 bp; 325 CpGs) and 30 hematological malignancy cell lines to SMRT-BS (average depth of 110X), which identified a spectrum of highly quantitative methylation levels across all interrogated CpG sites and cell lines.SMRT-BS is a novel, accurate and cost-effective targeted CpG methylation method that is amenable to a high degree of multiplexing with minimal clonal PCR artifacts. Increased sequencing depth is necessary when interrogating longer amplicons (>1.0 kb) and the previously reported bisulfite sequencing PCR bias towards unmethylated DNA should be considered when measuring intermediately methylated regions. Coupled with an optimized bisulfite PCR protocol, SMRT-BS is capable of interrogating ~1.5 kb amplicons, which theoretically can cover ~91% of CpG islands in the human genome.


July 19, 2019

Population structure of mitochondrial genomes in Saccharomyces cerevisiae.

Rigorous study of mitochondrial functions and cell biology in the budding yeast, Saccharomyces cerevisiae has advanced our understanding of mitochondrial genetics. This yeast is now a powerful model for population genetics, owing to large genetic diversity and highly structured populations among wild isolates. Comparative mitochondrial genomic analyses between yeast species have revealed broad evolutionary changes in genome organization and architecture. A fine-scale view of recent evolutionary changes within S. cerevisiae has not been possible due to low numbers of complete mitochondrial sequences.To address challenges of sequencing AT-rich and repetitive mitochondrial DNAs (mtDNAs), we sequenced two divergent S. cerevisiae mtDNAs using a single-molecule sequencing platform (PacBio RS). Using de novo assemblies, we generated highly accurate complete mtDNA sequences. These mtDNA sequences were compared with 98 additional mtDNA sequences gathered from various published collections. Phylogenies based on mitochondrial coding sequences and intron profiles revealed that intraspecific diversity in mitochondrial genomes generally recapitulated the population structure of nuclear genomes. Analysis of intergenic sequence indicated a recent expansion of mobile elements in certain populations. Additionally, our analyses revealed that certain populations lacked introns previously believed conserved throughout the species, as well as the presence of introns never before reported in S. cerevisiae.Our results revealed that the extensive variation in S. cerevisiae mtDNAs is often population specific, thus offering a window into the recent evolutionary processes shaping these genomes. In addition, we offer an effective strategy for sequencing these challenging AT-rich mitochondrial genomes for small scale projects.


July 19, 2019

Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing.

Single Molecule, Real-Time (SMRT(®)) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.


July 19, 2019

Characterizing and overriding the structural mechanism of the Quizartinib-resistant FLT3 “gatekeeper” F691L mutation with PLX3397.

Tyrosine kinase domain mutations are a common cause of acquired clinical resistance to tyrosine kinase inhibitors (TKI) used to treat cancer, including the FLT3 inhibitor quizartinib. Mutation of kinase “gatekeeper” residues, which control access to an allosteric pocket adjacent to the ATP-binding site, has been frequently implicated in TKI resistance. The molecular underpinnings of gatekeeper mutation-mediated resistance are incompletely understood. We report the first cocrystal structure of FLT3 with the TKI quizartinib, which demonstrates that quizartinib binding relies on essential edge-to-face aromatic interactions with the gatekeeper F691 residue, and F830 within the highly conserved Asp-Phe-Gly motif in the activation loop. This reliance makes quizartinib critically vulnerable to gatekeeper and activation loop substitutions while minimizing the impact of mutations elsewhere. Moreover, we identify PLX3397, a novel FLT3 inhibitor that retains activity against the F691L mutant due to a binding mode that depends less vitally on specific interactions with the gatekeeper position.We report the first cocrystal structure of FLT3 with a kinase inhibitor, elucidating the structural mechanism of resistance due to the gatekeeper F691L mutation. PLX3397 is a novel FLT3 inhibitor with in vitro activity against this mutation but is vulnerable to kinase domain mutations in the FLT3 activation loop. Cancer Discov; 5(6); 668-79. ©2015 AACR. This article is highlighted in the In This Issue feature, p. 565. ©2015 American Association for Cancer Research.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.