Menu
July 7, 2019

Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.

DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples.Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.


July 7, 2019

Characterization of structural variants with single molecule and hybrid sequencing approaches.

Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent ‘third-generation’ sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates.We present MultiBreak-SV, an algorithm to detect structural variants (SVs) from single molecule sequencing data, paired read sequencing data, or a combination of sequencing data from different platforms. We demonstrate that combining low-coverage third-generation data from Pacific Biosciences (PacBio) with high-coverage paired read data is advantageous on simulated chromosomes. We apply MultiBreak-SV to PacBio data from four human fosmids and show that it detects known SVs with high sensitivity and specificity. Finally, we perform a whole-genome analysis on PacBio data from a complete hydatidiform mole cell line and predict 1002 high-probability SVs, over half of which are confirmed by an Illumina-based assembly. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Sequence-dependent elongation dynamics on macrolide-bound ribosomes.

The traditional view of macrolide antibiotics as plugs inside the ribosomal nascent peptide exit tunnel (NPET) has lately been challenged in favor of a more complex, heterogeneous mechanism, where drug-peptide interactions determine the fate of a translating ribosome. To investigate these highly dynamic processes, we applied single-molecule tracking of elongating ribosomes during inhibition of elongation by erythromycin of several nascent chains, including ErmCL and H-NS, which were shown to be, respectively, sensitive and resistant to erythromycin. Peptide sequence-specific changes were observed in translation elongation dynamics in the presence of a macrolide-obstructed NPET. Elongation rates were not severely inhibited in general by the presence of the drug; instead, stalls or pauses were observed as abrupt events. The dynamic pathways of nascent-chain-dependent elongation pausing in the presence of macrolides determine the fate of the translating ribosome stalling or readthrough. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.


July 7, 2019

Replication of the Escherichia coli chromosome in RNase HI-deficient cells: multiple initiation regions and fork dynamics.

DNA replication in Escherichia coli is normally initiated at a single origin, oriC, dependent on initiation protein DnaA. However, replication can be initiated elsewhere on the chromosome at multiple ectopic oriK sites. Genetic evidence indicates that initiation from oriK depends on RNA-DNA hybrids (R-loops), which are normally removed by enzymes such as RNase HI to prevent oriK from misfiring during normal growth. Initiation from oriK sites occurs in RNase HI-deficient mutants, and possibly in wild-type cells under certain unusual conditions. Despite previous work, the locations of oriK and their impact on genome stability remain unclear. We combined 2D gel electrophoresis and whole genome approaches to map genome-wide oriK locations. The DNA copy number profiles of various RNase HI-deficient strains contained multiple peaks, often in consistent locations, identifying candidate oriK sites. Removal of RNase HI protein also leads to global alterations of replication fork migration patterns, often opposite to normal replication directions, and presumably eukaryote-like replication fork merging. Our results have implications for genome stability, offering a new understanding of how RNase HI deficiency results in R-loop-mediated transcription-replication conflict, as well as inappropriate replication stalling or blockage at Ter sites outside of the terminus trap region and at ribosomal operons. © 2013 John Wiley & Sons Ltd.


July 7, 2019

The impact of aminoglycosides on the dynamics of translation elongation.

Inferring antibiotic mechanisms on translation through static structures has been challenging, as biological systems are highly dynamic. Dynamic single-molecule methods are also limited to few simultaneously measurable parameters. We have circumvented these limitations with a multifaceted approach to investigate three structurally distinct aminoglycosides that bind to the aminoacyl-transfer RNA site (A site) in the prokaryotic 30S ribosomal subunit: apramycin, paromomycin, and gentamicin. Using several single-molecule fluorescence measurements combined with structural and biochemical techniques, we observed distinct changes to translational dynamics for each aminoglycoside. While all three drugs effectively inhibit translation elongation, their actions are structurally and mechanistically distinct. Apramycin does not displace A1492 and A1493 at the decoding center, as demonstrated by a solution nuclear magnetic resonance structure, causing only limited miscoding; instead, it primarily blocks translocation. Paromomycin and gentamicin, which displace A1492 and A1493, cause significant miscoding, block intersubunit rotation, and inhibit translocation. Our results show the power of combined dynamics, structural, and biochemical approaches to elucidate the complex mechanisms underlying translation and its inhibition. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.


July 7, 2019

Single-molecule fluorescence imaging of processive myosin with enhanced background suppression using linear zero-mode waveguides (ZMWs) and convex lens induced confinement (CLIC).

Resolving single fluorescent molecules in the presence of high fluorophore concentrations remains a challenge in single-molecule biophysics that limits our understanding of weak molecular interactions. Total internal reflection fluorescence (TIRF) imaging, the workhorse of single-molecule fluorescence microscopy, enables experiments at concentrations up to about 100 nM, but many biological interactions have considerably weaker affinities, and thus require at least one species to be at micromolar or higher concentration. Current alternatives to TIRF often require three-dimensional confinement, and thus can be problematic for extended substrates, such as cytoskeletal filaments. To address this challenge, we have demonstrated and applied two new single-molecule fluorescence microscopy techniques, linear zero-mode waveguides (ZMWs) and convex lens induced confinement (CLIC), for imaging the processive motion of molecular motors myosin V and VI along actin filaments. Both technologies will allow imaging in the presence of higher fluorophore concentrations than TIRF microscopy. They will enable new biophysical measurements of a wide range of processive molecular motors that move along filamentous tracks, such as other myosins, dynein, and kinesin. A particularly salient application of these technologies will be to examine chemomechanical coupling by directly imaging fluorescent nucleotide molecules interacting with processive motors as they traverse their actin or microtubule tracks.


July 7, 2019

Coordinated conformational and compositional dynamics drive ribosome translocation.

During translation elongation, the ribosome compositional factors elongation factor G (EF-G; encoded by fusA) and tRNA alternately bind to the ribosome to direct protein synthesis and regulate the conformation of the ribosome. Here, we use single-molecule fluorescence with zero-mode waveguides to directly correlate ribosome conformation and composition during multiple rounds of elongation at high factor concentrations in Escherichia coli. Our results show that EF-G bound to GTP (EF-G-GTP) continuously samples both rotational states of the ribosome, binding with higher affinity to the rotated state. Upon successful accommodation into the rotated ribosome, the EF-G-ribosome complex evolves through several rate-limiting conformational changes and the hydrolysis of GTP, which results in a transition back to the nonrotated state and in turn drives translocation and facilitates release of both EF-G-GDP and E-site tRNA. These experiments highlight the power of tracking single-molecule conformation and composition simultaneously in real time.


July 7, 2019

Medulloblastoma exome sequencing uncovers subtype-specific somatic mutations.

Medulloblastomas are the most common malignant brain tumours in children. Identifying and understanding the genetic events that drive these tumours is critical for the development of more effective diagnostic, prognostic and therapeutic strategies. Recently, our group and others described distinct molecular subtypes of medulloblastoma on the basis of transcriptional and copy number profiles. Here we use whole-exome hybrid capture and deep sequencing to identify somatic mutations across the coding regions of 92 primary medulloblastoma/normal pairs. Overall, medulloblastomas have low mutation rates consistent with other paediatric tumours, with a median of 0.35 non-silent mutations per megabase. We identified twelve genes mutated at statistically significant frequencies, including previously known mutated genes in medulloblastoma such as CTNNB1, PTCH1, MLL2, SMARCA4 and TP53. Recurrent somatic mutations were newly identified in an RNA helicase gene, DDX3X, often concurrent with CTNNB1 mutations, and in the nuclear co-repressor (N-CoR) complex genes GPS2, BCOR and LDB1. We show that mutant DDX3X potentiates transactivation of a TCF promoter and enhances cell viability in combination with mutant, but not wild-type, ß-catenin. Together, our study reveals the alteration of WNT, hedgehog, histone methyltransferase and now N-CoR pathways across medulloblastomas and within specific subtypes of this disease, and nominates the RNA helicase DDX3X as a component of pathogenic ß-catenin signalling in medulloblastoma.


July 7, 2019

Real-time tRNA transit on single translating ribosomes at codon resolution.

Translation by the ribosome occurs by a complex mechanism involving the coordinated interaction of multiple nucleic acid and protein ligands. Here we use zero-mode waveguides (ZMWs) and sophisticated detection instrumentation to allow real-time observation of translation at physiologically relevant micromolar ligand concentrations. Translation at each codon is monitored by stable binding of transfer RNAs (tRNAs)-labelled with distinct fluorophores-to translating ribosomes, which allows direct detection of the identity of tRNA molecules bound to the ribosome and therefore the underlying messenger RNA (mRNA) sequence. We observe the transit of tRNAs on single translating ribosomes and determine the number of tRNA molecules simultaneously bound to the ribosome, at each codon of an mRNA molecule. Our results show that ribosomes are only briefly occupied by two tRNA molecules and that release of deacylated tRNA from the exit (E) site is uncoupled from binding of aminoacyl-tRNA site (A-site) tRNA and occurs rapidly after translocation. The methods outlined here have broad application to the study of mRNA sequences, and the mechanism and regulation of translation.


July 7, 2019

Computational solutions to large-scale data management and analysis.

Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist – such as cloud and heterogeneous computing – to successfully tackle our big data problems.


July 7, 2019

Long, processive enzymatic DNA synthesis using 100% dye-labeled terminal phosphate-linked nucleotides.

We demonstrate the efficient synthesis of DNA with complete replacement of the four deoxyribonucleoside triphosphate (dNTP) substrates with nucleotides carrying fluorescent labels. A different, spectrally separable fluorescent dye suitable for single molecule fluorescence detection was conjugated to each of the four dNTPs via linkage to the terminal phosphate. Using these modified nucleotides, DNA synthesis by phi 29 DNA polymerase was observed to be processive for products thousands of bases in length, with labeled nucleotide affinities and DNA polymerization rates approaching unmodified dNTP levels. Results presented here show the compatibility of these nucleotides for single-molecule, real-time DNA sequencing applications.


July 7, 2019

Genomic analysis of 495 vancomycin-resistant Enterococcus faecium reveals broad dissemination of a vanA plasmid in more than 19 clones from Copenhagen, Denmark.

From 2012 to 2014, there has been a huge increase in vancomycin-resistant (vanA) Enterococcus faecium (VREfm) in Copenhagen, Denmark, with 602 patients infected or colonized with VREfm in 2014 compared with just 22 in 2012. The objective of this study was to describe the genetic epidemiology of VREfm to assess the contribution of clonal spread and horizontal transfer of the vanA transposon (Tn1546) and plasmid in the dissemination of VREfm in hospitals.VREfm from Copenhagen, Denmark (2012-14) were whole-genome sequenced. The clonal structure was determined and the structure of Tn1546-like transposons was characterized. One VREfm isolate belonging to the largest clonal group was sequenced using long-read technology to close a 37 kb vanA plasmid.Phylogeny revealed a polyclonal structure where 495 VREfm isolates were divided into 13 main groups and 7 small groups. The majority of the isolates were located in three groups (n?=?44, 100 and 218) and clonal spread of VREfm between wards and hospitals was identified. Five Tn1546-like transposon types were identified. A dominant truncated transposon (type 4, 92%) was spread across all but one VREfm group. The closed vanA plasmid was highly covered by reads from isolates containing the type 4 transposon.This study suggests that it was the dissemination of the type 4 Tn1546-like transposon and plasmid via horizontal transfer to multiple populations of E. faecium, followed by clonal spread of new VREfm clones, that contributed to the increase in and diversity of VREfm in Danish hospitals.© The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Fallacy of the unique genome: sequence diversity within single Helicobacter pylori strains.

Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged Helicobacter pylori strain SS1 and its parent PMSS1 to assess intra- and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within these H. pylori isolates, including movement of the transposable element IS607, large and small inversions, multiple single nucleotide polymorphisms, and variation in cagA copy number. The cagA gene was found as 1 to 4 tandem copies located off the cag island in both SS1 and PMSS1; this copy number variation correlated with protein expression. To gain insight into the changes that occurred during mouse adaptation, we also compared SS1 and PMSS1 and observed 46 differences that were distinct from the within-genome variation. The most substantial was an insertion in cagY, which encodes a protein required for a type IV secretion system function. We detected modifications in genes coding for two proteins known to affect mouse colonization, the HpaA neuraminyllactose-binding protein and the FutB a-1,3 lipopolysaccharide (LPS) fucosyltransferase, as well as genes predicted to modulate diverse properties. In sum, our work suggests that data from consensus genome assemblies from single colonies may be misleading by failing to represent the variability present. Furthermore, we show that high-depth genomic sequencing data of a population can be analyzed to gain insight into the normal variation within bacterial strains.IMPORTANCE Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyze, and publish “the genome” of a bacterial strain. Variability is usually reduced (“only sequence from a single colony”), ignored (“just publish the consensus”), or placed in the “too-hard” basket (“analysis of raw read data is more robust”). Now that whole-genome sequences are regularly used to assess virulence and track outbreaks, a better understanding of the baseline genomic variation present within single strains is needed. Here, we describe the variability seen in typical working stocks and colonies of pathogen Helicobacter pylori model strains SS1 and PMSS1 as revealed by use of high-coverage mate pair next-generation sequencing (NGS) and confirmed by traditional laboratory techniques. This work demonstrates that reliance on a consensus assembly as “the genome” of a bacterial strain may be misleading. Copyright © 2017 Draper et al.


July 7, 2019

HINGE: long-read assembly achieves optimal repeat resolution.

Long-read sequencing technologies have the potential to produce gold-standard de novo genome assemblies, but fully exploiting error-prone reads to resolve repeats remains a challenge. Aggressive approaches to repeat resolution often produce misassemblies, and conservative approaches lead to unnecessary fragmentation. We present HINGE, an assembler that seeks to achieve optimal repeat resolution by distinguishing repeats that can be resolved given the data from those that cannot. This is accomplished by adding “hinges” to reads for constructing an overlap graph where only unresolvable repeats are merged. As a result, HINGE combines the error resilience of overlap-based assemblers with repeat-resolution capabilities of de Bruijn graph assemblers. HINGE was evaluated on the long-read bacterial data sets from the NCTC project. HINGE produces more finished assemblies than Miniasm and the manual pipeline of NCTC based on the HGAP assembler and Circlator. HINGE also allows us to identify 40 data sets where unresolvable repeats prevent the reliable construction of a unique finished assembly. In these cases, HINGE outputs a visually interpretable assembly graph that encodes all possible finished assemblies consistent with the reads, while other approaches such as the NCTC pipeline and FALCON either fragment the assembly or resolve the ambiguity arbitrarily.© 2017 Kamath et al.; Published by Cold Spring Harbor Laboratory Press.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.