Menu
July 7, 2019

Next generation sequencing technologies and the changing landscape of phage genomics.

The dawn of next generation sequencing technologies has opened up exciting possibilities for whole genome sequencing of a plethora of organisms. The 2nd and 3rd generation sequencing technologies, based on cloning-free, massively parallel sequencing, have enabled the generation of a deluge of genomic sequences of both prokaryotic and eukaryotic origin in the last seven years. However, whole genome sequencing of bacterial viruses has not kept pace with this revolution, despite the fact that their genomes are orders of magnitude smaller in size compared with bacteria and other organisms. Sequencing phage genomes poses several challenges; (1) obtaining pure phage genomic material, (2) PCR amplification biases and (3) complex nature of their genetic material due to features such as methylated bases and repeats that are inherently difficult to sequence and assemble. Here we describe conclusions drawn from our efforts in sequencing hundreds of bacteriophage genomes from a variety of Gram-positive and Gram-negative bacteria using Sanger, 454, Illumina and PacBio technologies. Based on our experience we propose several general considerations regarding sample quality, the choice of technology and a “blended approach” for generating reliable whole genome sequences of phages.


July 7, 2019

Analysis of a unique Clostridium botulinum strain from the Southern hemisphere producing a novel type E botulinum neurotoxin subtype.

Clostridium botulinum strains that produce botulinum neurotoxin type E (BoNT/E) are most commonly isolated from botulism cases, marine environments, and animals in regions of high latitude in the Northern hemisphere. A strain of C. botulinum type E (CDC66177) was isolated from soil in Chubut, Argentina. Previous studies showed that the amino acid sequences of BoNT/E produced by various strains differ by < 6% and that the type E neurotoxin gene cluster inserts into the rarA operon.Genetic and mass spectral analysis demonstrated that the BoNT/E produced by CDC66177 is a novel toxin subtype (E9). Toxin gene sequencing indicated that BoNT/E9 differed by nearly 11% at the amino acid level compared to BoNT/E1. Mass spectrometric analysis of BoNT/E9 revealed that its endopeptidase substrate cleavage site was identical to other BoNT/E subtypes. Further analysis of this strain demonstrated that its 16S rRNA sequence clustered with other Group II C. botulinum (producing BoNT types B, E, and F) strains. Genomic DNA isolated from strain CDC66177 hybridized with fewer probes using a Group II C. botulinum subtyping microarray compared to other type E strains examined. Whole genome shotgun sequencing of strain CDC66177 revealed that while the toxin gene cluster inserted into the rarA operon similar to other type E strains, its overall genome content shared greater similarity with a Group II C. botulinum type B strain (17B).These results expand our understanding of the global distribution of C. botulinum type E strains and suggest that the type E toxin gene cluster may be able to insert into C. botulinum strains with a more diverse genetic background than previously recognized.


July 7, 2019

The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation.

The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation.In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis.These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).


July 7, 2019

Direct sequencing of small genomes on the Pacific Biosciences RS without library preparation.

We have developed a sequencing method on the Pacific Biosciences RS sequencer (the PacBio) for small DNA molecules that avoids the need for a standard library preparation. To date this approach has been applied toward sequencing single-stranded and double-stranded viral genomes, bacterial plasmids, plasmid vector models for DNA-modification analysis, and linear DNA fragments covering an entire bacterial genome. Using direct sequencing it is possible to generate sequence data from as little as 1 ng of DNA, offering a significant advantage over current protocols which typically require 400-500 ng of sheared DNA for the library preparation.


July 7, 2019

Sensitive and specific single-molecule sequencing of 5-hydroxymethylcytosine.

We describe strand-specific, base-resolution detection of 5-hydroxymethylcytosine (5-hmC) in genomic DNA with single-molecule sensitivity, combining a bioorthogonal, selective chemical labeling method of 5-hmC with single-molecule, real-time (SMRT) DNA sequencing. The chemical labeling not only allows affinity enrichment of 5-hmC-containing DNA fragments but also enhances the kinetic signal of 5-hmC during SMRT sequencing. We applied the approach to sequence 5-hmC in a genomic DNA sample with high confidence.


July 7, 2019

Next-generation sequencing and large genome assemblies.

The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed.


July 7, 2019

Medulloblastoma exome sequencing uncovers subtype-specific somatic mutations.

Medulloblastomas are the most common malignant brain tumours in children. Identifying and understanding the genetic events that drive these tumours is critical for the development of more effective diagnostic, prognostic and therapeutic strategies. Recently, our group and others described distinct molecular subtypes of medulloblastoma on the basis of transcriptional and copy number profiles. Here we use whole-exome hybrid capture and deep sequencing to identify somatic mutations across the coding regions of 92 primary medulloblastoma/normal pairs. Overall, medulloblastomas have low mutation rates consistent with other paediatric tumours, with a median of 0.35 non-silent mutations per megabase. We identified twelve genes mutated at statistically significant frequencies, including previously known mutated genes in medulloblastoma such as CTNNB1, PTCH1, MLL2, SMARCA4 and TP53. Recurrent somatic mutations were newly identified in an RNA helicase gene, DDX3X, often concurrent with CTNNB1 mutations, and in the nuclear co-repressor (N-CoR) complex genes GPS2, BCOR and LDB1. We show that mutant DDX3X potentiates transactivation of a TCF promoter and enhances cell viability in combination with mutant, but not wild-type, ß-catenin. Together, our study reveals the alteration of WNT, hedgehog, histone methyltransferase and now N-CoR pathways across medulloblastomas and within specific subtypes of this disease, and nominates the RNA helicase DDX3X as a component of pathogenic ß-catenin signalling in medulloblastoma.


July 7, 2019

Bacteriophage P70: unique morphology and unrelatedness to other Listeria bacteriophages.

Listeria monocytogenes is an important food-borne pathogen, and its bacteriophages find many uses in detection and biocontrol of its host. The novel broad-host-range virulent phage P70 has a unique morphology with an elongated capsid. Its genome sequence was determined by a hybrid sequencing strategy employing Sanger and PacBio techniques. The P70 genome contains 67,170 bp and 119 open reading frames (ORFs). Our analyses suggest that P70 represents an archetype of virus unrelated to other known Listeria bacteriophages.


July 7, 2019

An Inv(16)(p13.3q24.3)-encoded CBFA2T3-GLIS2 fusion protein defines an aggressive subtype of pediatric acute megakaryoblastic leukemia.

To define the mutation spectrum in non-Down syndrome acute megakaryoblastic leukemia (non-DS-AMKL), we performed transcriptome sequencing on diagnostic blasts from 14 pediatric patients and validated our findings in a recurrency/validation cohort consisting of 34 pediatric and 28 adult AMKL samples. Our analysis identified a cryptic chromosome 16 inversion (inv(16)(p13.3q24.3)) in 27% of pediatric cases, which encodes a CBFA2T3-GLIS2 fusion protein. Expression of CBFA2T3-GLIS2 in Drosophila and murine hematopoietic cells induced bone morphogenic protein (BMP) signaling and resulted in a marked increase in the self-renewal capacity of hematopoietic progenitors. These data suggest that expression of CBFA2T3-GLIS2 directly contributes to leukemogenesis. Copyright © 2012 Elsevier Inc. All rights reserved.


July 7, 2019

Strobe sequence design for haplotype assembly.

Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and detecting cis-regulatory events. As the pair of chromosomes are mostly identical to each other, linking together of alleles at heterozygous sites is sufficient to phase, or separate the two chromosomes. In Haplotype Assembly, the linking is done by sequenced fragments that overlap two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes via assembly, relatively little work has been done on designing sequencing experiments to get long haplotypes. Here, we describe the different design parameters that can be adjusted with next generation and upcoming sequencing technologies, and study the impact of design choice on the length of the haplotype.We show that a number of parameters influence haplotype length, with the most significant one being the advance length (distance between two fragments of a clone). Given technologies like strobe sequencing that allow for large variations in advance lengths, we design and implement a simulated annealing algorithm to sample a large space of distributions over advance-lengths. Extensive simulations on individual genomic sequences suggest that a non-trivial distribution over advance lengths results a 1-2 order of magnitude improvement in median haplotype length.Our results suggest that haplotyping of large, biologically important genomic regions is feasible with current technologies.


July 7, 2019

Direct detection and sequencing of damaged DNA bases.

Products of various forms of DNA damage have been implicated in a variety of important biological processes, such as aging, neurodegenerative diseases, and cancer. Therefore, there exists great interest to develop methods for interrogating damaged DNA in the context of sequencing. Here, we demonstrate that single-molecule, real-time (SMRT®) DNA sequencing can directly detect damaged DNA bases in the DNA template – as a by-product of the sequencing method – through an analysis of the DNA polymerase kinetics that are altered by the presence of a modified base. We demonstrate the sequencing of several DNA templates containing products of DNA damage, including 8-oxoguanine, 8-oxoadenine, O6-methylguanine, 1-methyladenine, O4-methylthymine, 5-hydroxycytosine, 5-hydroxyuracil, 5-hydroxymethyluracil, or thymine dimers, and show that these base modifications can be readily detected with single-modification resolution and DNA strand specificity. We characterize the distinct kinetic signatures generated by these DNA base modifications.


July 7, 2019

Real-time sequencing.

This month’s Genome Watch describes the impact of next-generation sequencing on the ‘real-time’ analysis of pathogen genomes during outbreaks.


July 7, 2019

Structural variation analysis with strobe reads.

Structural variation including deletions, duplications and rearrangements of DNA sequence are an important contributor to genome variation in many organisms. In human, many structural variants are found in complex and highly repetitive regions of the genome making their identification difficult. A new sequencing technology called strobe sequencing generates strobe reads containing multiple subreads from a single contiguous fragment of DNA. Strobe reads thus generalize the concept of paired reads, or mate pairs, that have been routinely used for structural variant detection. Strobe sequencing holds promise for unraveling complex variants that have been difficult to characterize with current sequencing technologies.We introduce an algorithm for identification of structural variants using strobe sequencing data. We consider strobe reads from a test genome that have multiple possible alignments to a reference genome due to sequencing errors and/or repetitive sequences in the reference. We formulate the combinatorial optimization problem of finding the minimum number of structural variants in the test genome that are consistent with these alignments. We solve this problem using an integer linear program. Using simulated strobe sequencing data, we show that our algorithm has better sensitivity and specificity than paired read approaches for structural variation identification.braphael@brown.edu


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.