Menu
July 19, 2019

Advantages of Single-Molecule Real-Time Sequencing in high-GC content genomes.

Next-generation sequencing has become the most widely used sequencing technology in genomics research, but it has inherent drawbacks when dealing with high-GC content genomes. Recently, single-molecule real-time sequencing technology (SMRT) was introduced as a third-generation sequencing strategy to compensate for this drawback. Here, we report that the unbiased and longer read length of SMRT sequencing markedly improved genome assembly with high GC content via gap filling and repeat resolution.


July 19, 2019

Efficient and accurate whole genome assembly and methylome profiling of E. coli.

With the price of next generation sequencing steadily decreasing, bacterial genome assembly is now accessible to a wide range of researchers. It is therefore necessary to understand the best methods for generating a genome assembly, specifically, which combination of sequencing and bioinformatics strategies result in the most accurate assemblies. Here, we sequence three E. coli strains on the Illumina MiSeq, Life Technologies Ion Torrent PGM, and Pacific Biosciences RS. We then perform genome assemblies on all three datasets alone or in combination to determine the best methods for the assembly of bacterial genomes.Three E. coli strains – BL21(DE3), Bal225, and DH5a – were sequenced to a depth of 100× on the MiSeq and Ion Torrent machines and to at least 125× on the PacBio RS. Four assembly methods were examined and compared. The previously published BL21(DE3) genome [GenBank:AM946981.2], allowed us to evaluate the accuracy of each of the BL21(DE3) assemblies. BL21(DE3) PacBio-only assemblies resulted in a 90% reduction in contigs versus short read only assemblies, while N50 numbers increased by over 7-fold. Strikingly, the number of SNPs in PacBio-only assemblies were less than half that seen with short read assemblies (~20 SNPs vs. ~50 SNPs) and indels also saw dramatic reductions (~2 indel >5 bp in PacBio-only assemblies vs. ~12 for short-read only assemblies). Assemblies that used a mixture of PacBio and short read data generally fell in between these two extremes. Use of PacBio sequencing reads also allowed us to call covalent base modifications for the three strains. Each of the strains used here had a known covalent base modification genotype, which was confirmed by PacBio sequencing.Using data generated solely from the Pacific Biosciences RS, we were able to generate the most complete and accurate de novo assemblies of E. coli strains. We found that the addition of other sequencing technology data offered no improvements over use of PacBio data alone. In addition, the sequencing data from the PacBio RS allowed for sensitive and specific calling of covalent base modifications.


July 19, 2019

Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans.

We have used whole genome paired-end Illumina sequence data to identify tandem duplications in 20 isofemale lines of Drosophila yakuba and 20 isofemale lines of D. simulans and performed genome wide validation with PacBio long molecule sequencing. We identify 1,415 tandem duplications that are segregating in D. yakuba as well as 975 duplications in D. simulans, indicating greater variation in D. yakuba. Additionally, we observe high rates of secondary deletions at duplicated sites, with 8% of duplicated sites in D. simulans and 17% of sites in D. yakuba modified with deletions. These secondary deletions are consistent with the action of the large loop mismatch repair system acting to remove polymorphic tandem duplication, resulting in rapid dynamics of gain and loss in duplicated alleles and a richer substrate of genetic novelty than has been previously reported. Most duplications are present in only single strains, suggesting that deleterious impacts are common. Drosophila simulans shows larger numbers of whole gene duplications in comparison to larger proportions of gene fragments in D. yakuba. Drosophila simulans displays an excess of high-frequency variants on the X chromosome, consistent with adaptive evolution through duplications on the D. simulans X or demographic forces driving duplicates to high frequency. We identify 78 chimeric genes in D. yakuba and 38 chimeric genes in D. simulans, as well as 143 cases of recruited noncoding sequence in D. yakuba and 96 in D. simulans, in agreement with rates of chimeric gene origination in D. melanogaster. Together, these results suggest that tandem duplications often result in complex variation beyond whole gene duplications that offers a rich substrate of standing variation that is likely to contribute both to detrimental phenotypes and disease, as well as to adaptive evolutionary change. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 19, 2019

Preparation of next-generation DNA sequencing libraries from ultra-low amounts of input DNA: Application to single-molecule, real-time (SMRT) sequencing on the Pacific Biosciences RS II.

We have developed and validated an amplification-free method for generating DNA sequencing libraries from very low amounts of input DNA (500 picograms – 20 nanograms) for single- molecule sequencing on the Pacific Biosciences (PacBio) RS II sequencer. The common challenge of high input requirements for single-molecule sequencing is overcome by using a carrier DNA in conjunction with optimized sequencing preparation conditions and re-use of the MagBead-bound complex. Here we describe how this method can be used to produce sequencing yields comparable to those generated from standard input amounts, but by using 1000-fold less starting material.


July 19, 2019

Error correction and assembly complexity of single molecule sequencing reads.

Third generation single molecule sequencing technology is poised to revolutionize genomics by en- abling the sequencing of long, individual molecules of DNA and RNA. These technologies now routinely produce reads exceeding 5,000 basepairs, and can achieve reads as long as 50,000 basepairs. Here we evaluate the limits of single molecule sequencing by assessing the impact of long read sequencing in the assembly of the human genome and 25 other important genomes across the tree of life. From this, we develop a new data-driven model using support vector regression that can accurately predict assembly performance. We also present a novel hybrid error correction algorithm for long PacBio sequencing reads that uses pre-assembled Illumina sequences for the error correction. We apply it several prokaryotic and eukaryotic genomes, and show it can achieve near-perfect assemblies of small genomes (< 100Mbp) and substantially improved assemblies of larger ones. All source code and the assembly model are available open-source.


July 19, 2019

Genome rearrangements and pervasive meiotic drive cause hybrid infertility in fission yeast.

Hybrid sterility is one of the earliest postzygotic isolating mechanisms to evolve between two recently diverged species. Here we identify causes underlying hybrid infertility of two recently diverged fission yeast species Schizosaccharomyces pombe and S. kambucha, which mate to form viable hybrid diploids that efficiently complete meiosis, but generate few viable gametes. We find that chromosomal rearrangements and related recombination defects are major but not sole causes of hybrid infertility. At least three distinct meiotic drive alleles, one on each S. kambucha chromosome, independently contribute to hybrid infertility by causing nonrandom spore death. Two of these driving loci are linked by a chromosomal translocation and thus constitute a novel type of paired meiotic drive complex. Our study reveals how quickly multiple barriers to fertility can arise. In addition, it provides further support for models in which genetic conflicts, such as those caused by meiotic drive alleles, can drive speciation.DOI: http://dx.doi.org/10.7554/eLife.02630.001. Copyright © 2014, Zanders et al.


July 19, 2019

Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes.

The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes. We sequenced the genome of V. parahaemolyticus with GS Jr, Ion PGM, MiSeq, and PacBio and performed de novo assembly with several genome assemblers. Although GS Jr generated the longest mean read length of 418 bp among the second-generation sequencers, the maximum contig length of the best assembly from GS Jr was 165 kbp, and the number of contigs was 309. Single runs of Ion PGM and MiSeq produced data of considerably greater sequencing coverage, 279× and 1,927×, respectively. The optimized result for Ion PGM contained 61 contigs assembled from reads of 77× coverage, and the longest contig was 895 kbp in size. Those for MiSeq were 34 contigs, 58×?coverage, and 733 kbp, respectively. These results suggest that higher coverage depth is unnecessary for a better assembly result. We observed that multiple rRNA coding regions were fragmented in the assemblies from the second-generation sequencers, whereas PacBio generated two exceptionally long contigs of 3,288,561 and 1,875,537 bps, each of which was from a single chromosome, with 73× coverage and mean read length 3,119 bp, allowing us to determine the absolute positions of all rRNA operons. PacBio outperformed the other sequencers in terms of the length of contigs and reconstructed the greatest portion of the genome, achieving a genome assembly of “finished grade” because of its long reads. It showed the potential to assemble more complex genomes with multiple chromosomes containing more repetitive sequences.


July 19, 2019

Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany.

A large outbreak of diarrhea and the hemolytic-uremic syndrome caused by an unusual serotype of Shiga-toxin-producing Escherichia coli (O104:H4) began in Germany in May 2011. As of July 22, a large number of cases of diarrhea caused by Shiga-toxin-producing E. coli have been reported–3167 without the hemolytic-uremic syndrome (16 deaths) and 908 with the hemolytic-uremic syndrome (34 deaths)–indicating that this strain is notably more virulent than most of the Shiga-toxin-producing E. coli strains. Preliminary genetic characterization of the outbreak strain suggested that, unlike most of these strains, it should be classified within the enteroaggregative pathotype of E. coli.We used third-generation, single-molecule, real-time DNA sequencing to determine the complete genome sequence of the German outbreak strain, as well as the genome sequences of seven diarrhea-associated enteroaggregative E. coli serotype O104:H4 strains from Africa and four enteroaggregative E. coli reference strains belonging to other serotypes. Genomewide comparisons were performed with the use of these enteroaggregative E. coli genomes, as well as those of 40 previously sequenced E. coli isolates.The enteroaggregative E. coli O104:H4 strains are closely related and form a distinct clade among E. coli and enteroaggregative E. coli strains. However, the genome of the German outbreak strain can be distinguished from those of other O104:H4 strains because it contains a prophage encoding Shiga toxin 2 and a distinct set of additional virulence and antibiotic-resistance factors.Our findings suggest that horizontal genetic exchange allowed for the emergence of the highly virulent Shiga-toxin-producing enteroaggregative E. coli O104:H4 strain that caused the German outbreak. More broadly, these findings highlight the way in which the plasticity of bacterial genomes facilitates the emergence of new pathogens.


July 19, 2019

An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome.

Second generation sequencing has permitted detailed sequence characterisation at the whole genome level of a growing number of non-model organisms, but the data produced have short read-lengths and biased genome coverage leading to fragmented genome assemblies. The PacBio RS long-read sequencing platform offers the promise of increased read length and unbiased genome coverage and thus the potential to produce genome sequence data of a finished quality containing fewer gaps and longer contigs. However, these advantages come at a much greater cost per nucleotide and with a perceived increase in error-rate. In this investigation, we evaluated the performance of the PacBio RS sequencing platform through the sequencing and de novo assembly of the Potentilla micrantha chloroplast genome.Following error-correction, a total of 28,638 PacBio RS reads were recovered with a mean read length of 1,902 bp totalling 54,492,250 nucleotides and representing an average depth of coverage of 320× the chloroplast genome. The dataset covered the entire 154,959 bp of the chloroplast genome in a single contig (100% coverage) compared to seven contigs (90.59% coverage) recovered from an Illumina data, and revealed no bias in coverage of GC rich regions. Post-assembly the data were largely concordant with the Illumina data generated and allowed 187 ambiguities in the Illumina data to be resolved. The additional read length also permitted small differences in the two inverted repeat regions to be assigned unambiguously.This is the first report to our knowledge of a chloroplast genome assembled de novo using PacBio sequence data. The PacBio RS data generated here were assembled into a single large contig spanning the P. micrantha chloroplast genome, with a higher degree of accuracy than an Illumina dataset generated at a much greater depth of coverage, due to longer read lengths and lower GC bias in the data. The results we present suggest PacBio data will be of immense utility for the development of genome sequence assemblies containing fewer unresolved gaps and ambiguities and a significantly smaller number of contigs than could be produced using short-read sequence data alone.


July 19, 2019

The complete genome sequence of Escherichia coli EC958: a high quality reference sequence for the globally disseminated multidrug resistant E. coli O25b:H4-ST131 clone.

Escherichia coli ST131 is now recognised as a leading contributor to urinary tract and bloodstream infections in both community and clinical settings. Here we present the complete, annotated genome of E. coli EC958, which was isolated from the urine of a patient presenting with a urinary tract infection in the Northwest region of England and represents the most well characterised ST131 strain. Sequencing was carried out using the Pacific Biosciences platform, which provided sufficient depth and read-length to produce a complete genome without the need for other technologies. The discovery of spurious contigs within the assembly that correspond to site-specific inversions in the tail fibre regions of prophages demonstrates the potential for this technology to reveal dynamic evolutionary mechanisms. E. coli EC958 belongs to the major subgroup of ST131 strains that produce the CTX-M-15 extended spectrum ß-lactamase, are fluoroquinolone resistant and encode the fimH30 type 1 fimbrial adhesin. This subgroup includes the Indian strain NA114 and the North American strain JJ1886. A comparison of the genomes of EC958, JJ1886 and NA114 revealed that differences in the arrangement of genomic islands, prophages and other repetitive elements in the NA114 genome are not biologically relevant and are due to misassembly. The availability of a high quality uropathogenic E. coli ST131 genome provides a reference for understanding this multidrug resistant pathogen and will facilitate novel functional, comparative and clinical studies of the E. coli ST131 clonal lineage.


July 19, 2019

New insights into dissemination and variation of the health care-associated pathogen Acinetobacter baumannii from genomic analysis.

Acinetobacter baumannii is a globally important nosocomial pathogen characterized by an increasing incidence of multidrug resistance. Routes of dissemination and gene flow among health care facilities are poorly resolved and are important for understanding the epidemiology of A. baumannii, minimizing disease transmission, and improving patient outcomes. We used whole-genome sequencing to assess diversity and genome dynamics in 49 isolates from one United States hospital system during one year from 2007 to 2008. Core single-nucleotide-variant-based phylogenetic analysis revealed multiple founder strains and multiple independent strains recovered from the same patient yet was insufficient to fully resolve strain relationships, where gene content and insertion sequence patterns added additional discriminatory power. Gene content comparisons illustrated extensive and redundant antibiotic resistance gene carriage and direct evidence of gene transfer, recombination, gene loss, and mutation. Evidence of barriers to gene flow among hospital components was not found, suggesting complex mixing of strains and a large reservoir of A. baumannii strains capable of colonizing patients.Genome sequencing was used to characterize multidrug-resistant Acinetobacter baumannii strains from one United States hospital system during a 1-year period to better understand how A. baumannii strains that cause infection are related to one another. Extensive variation in gene content was found, even among strains that were very closely related phylogenetically and epidemiologically. Several mechanisms contributed to this diversity, including transfer of mobile genetic elements, mobilization of insertion sequences, insertion sequence-mediated deletions, and genome-wide homologous recombination. Variation in gene content, however, lacked clear spatial or temporal patterns, suggesting a diverse pool of circulating strains with considerable interaction between strains and hospital locations. Widespread genetic variation among strains from the same hospital and even the same patient, particularly involving antibiotic resistance genes, reinforces the need for molecular diagnostic testing and genomic analysis to determine resistance profiles, rather than a reliance primarily on strain typing and antimicrobial resistance phenotypes for epidemiological studies.


July 19, 2019

Comprehensive methylome characterization of Mycoplasma genitalium and Mycoplasma pneumoniae at single-base resolution.

In the bacterial world, methylation is most commonly associated with restriction-modification systems that provide a defense mechanism against invading foreign genomes. In addition, it is known that methylation plays functionally important roles, including timing of DNA replication, chromosome partitioning, DNA repair, and regulation of gene expression. However, full DNA methylome analyses are scarce due to a lack of a simple methodology for rapid and sensitive detection of common epigenetic marks (ie N(6)-methyladenine (6 mA) and N(4)-methylcytosine (4 mC)), in these organisms. Here, we use Single-Molecule Real-Time (SMRT) sequencing to determine the methylomes of two related human pathogen species, Mycoplasma genitalium G-37 and Mycoplasma pneumoniae M129, with single-base resolution. Our analysis identified two new methylation motifs not previously described in bacteria: a widespread 6 mA methylation motif common to both bacteria (5′-CTAT-3′), as well as a more complex Type I m6A sequence motif in M. pneumoniae (5′-GAN(7)TAY-3’/3′-CTN(7)ATR-5′). We identify the methyltransferase responsible for the common motif and suggest the one involved in M. pneumoniae only. Analysis of the distribution of methylation sites across the genome of M. pneumoniae suggests a potential role for methylation in regulating the cell cycle, as well as in regulation of gene expression. To our knowledge, this is one of the first direct methylome profiling studies with single-base resolution from a bacterial organism.


July 19, 2019

Resistance determinants and mobile genetic elements of an NDM-1-encoding Klebsiella pneumoniae strain.

Multidrug-resistant Enterobacteriaceae are emerging as a serious infectious disease challenge. These strains can accumulate many antibiotic resistance genes though horizontal transfer of genetic elements, those for ß-lactamases being of particular concern. Some ß-lactamases are active on a broad spectrum of ß-lactams including the last-resort carbapenems. The gene for the broad-spectrum and carbapenem-active metallo-ß-lactamase NDM-1 is rapidly spreading. We present the complete genome of Klebsiella pneumoniae ATCC BAA-2146, the first U.S. isolate found to encode NDM-1, and describe its repertoire of antibiotic-resistance genes and mutations, including genes for eight ß-lactamases and 15 additional antibiotic-resistance enzymes. To elucidate the evolution of this rich repertoire, the mobile elements of the genome were characterized, including four plasmids with varying degrees of conservation and mosaicism and eleven chromosomal genomic islands. One island was identified by a novel phylogenomic approach, that further indicated the cps-lps polysaccharide synthesis locus, where operon translocation and fusion was noted. Unique plasmid segments and mosaic junctions were identified. Plasmid-borne blaCTX-M-15 was transposed recently to the chromosome by ISEcp1. None of the eleven full copies of IS26, the most frequent IS element in the genome, had the expected 8-bp direct repeat of the integration target sequence, suggesting that each copy underwent homologous recombination subsequent to its last transposition event. Comparative analysis likewise indicates IS26 as a frequent recombinational junction between plasmid ancestors, and also indicates a resolvase site. In one novel use of high-throughput sequencing, homologously recombinant subpopulations of the bacterial culture were detected. In a second novel use, circular transposition intermediates were detected for the novel insertion sequence ISKpn21 of the ISNCY family, suggesting that it uses the two-step transposition mechanism of IS3. Robust genome-based phylogeny showed that a unified Klebsiella cluster contains Enterobacter aerogenes and Raoultella, suggesting the latter genus should be abandoned.


July 19, 2019

Shared signatures of parasitism and phylogenomics unite Cryptomycota and microsporidia.

Fungi grow within their food, externally digesting it and absorbing nutrients across a semirigid chitinous cell wall. Members of the new phylum Cryptomycota were proposed to represent intermediate fungal forms, lacking a chitinous cell wall during feeding and known almost exclusively from ubiquitous environmental ribosomal RNA sequences that cluster at the base of the fungal tree [1, 2]. Here, we sequence the first Cryptomycotan genome (the water mold endoparasite Rozella allomycis) and unite the Cryptomycota with another group of endoparasites, the microsporidia, based on phylogenomics and shared genomic traits. We propose that Cryptomycota and microsporidia share a common endoparasitic ancestor, with the clade unified by a chitinous cell wall used to develop turgor pressure in the infection process [3, 4]. Shared genomic elements include a nucleotide transporter that is used by microsporidia for stealing energy in the form of ATP from their hosts [5]. Rozella harbors a mitochondrion that contains a very rapidly evolving genome and lacks complex I of the respiratory chain. These degenerate features are offset by the presence of nuclear genes for alternative respiratory pathways. The Rozella proteome has not undergone major contraction like microsporidia; instead, several classes have undergone expansion, such as host-effector, signal-transduction, and folding proteins. Copyright © 2013 Elsevier Ltd. All rights reserved.


July 19, 2019

Characterizing and measuring bias in sequence data.

DNA sequencing technologies deviate from the ideal uniform distribution of reads. These biases impair scientific and medical applications. Accordingly, we have developed computational methods for discovering, describing and measuring bias.We applied these methods to the Illumina, Ion Torrent, Pacific Biosciences and Complete Genomics sequencing platforms, using data from human and from a set of microbes with diverse base compositions. As in previous work, library construction conditions significantly influence sequencing bias. Pacific Biosciences coverage levels are the least biased, followed by Illumina, although all technologies exhibit error-rate biases in high- and low-GC regions and at long homopolymer runs. The GC-rich regions prone to low coverage include a number of human promoters, so we therefore catalog 1,000 that were exceptionally resistant to sequencing. Our results indicate that combining data from two technologies can reduce coverage bias if the biases in the component technologies are complementary and of similar magnitude. Analysis of Illumina data representing 120-fold coverage of a well-studied human sample reveals that 0.20% of the autosomal genome was covered at less than 10% of the genome-wide average. Excluding locations that were similar to known bias motifs or likely due to sample-reference variations left only 0.045% of the autosomal genome with unexplained poor coverage.The assays presented in this paper provide a comprehensive view of sequencing bias, which can be used to drive laboratory improvements and to monitor production processes. Development guided by these assays should result in improved genome assemblies and better coverage of biologically important loci.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.