Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered for the first time. A second revolution came when next-generation sequencing (NGS) technologies appeared, which made genome sequencing much cheaper and faster. However, NGS methods have several drawbacks and pitfalls, most notably their short reads. Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality. Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. This marks the third revolution in sequencing technology. Here we review and compare the various long-read methods. We discuss their applications and their respective strengths and weaknesses and provide future perspectives. Copyright © 2018 Elsevier Ltd. All rights reserved.
Genome-wide identification and analysis of the ALTERNATIVE OXIDASE gene family in diploid and hexaploid wheat.
A comprehensive understanding of wheat responses to environmental stress will contribute to the long-term goal of feeding the planet. ALERNATIVE OXIDASE (AOX) genes encode proteins involved in a bypass of the electron transport chain and are also known to be involved in stress tolerance in multiple species. Here, we report the identification and characterization of the AOX gene family in diploid and hexaploid wheat. Four genes each were found in the diploid ancestors Triticum urartu, and Aegilops tauschii, and three in Aegilops speltoides. In hexaploid wheat (Triticum aestivum), 20 genes were identified, some with multiple splice variants, corresponding to a total of 24 proteins for those with observed transcription and translation. These proteins were classified as AOX1a, AOX1c, AOX1e or AOX1d via phylogenetic analysis. Proteins lacking most or all signature AOX motifs were assigned to putative regulatory roles. Analysis of protein-targeting sequences suggests mixed localization to the mitochondria and other organelles. In comparison to the most studied AOX from Trypanosoma brucei, there were amino acid substitutions at critical functional domains indicating possible role divergence in wheat or grasses in general. In hexaploid wheat, AOX genes were expressed at specific developmental stages as well as in response to both biotic and abiotic stresses such as fungal pathogens, heat and drought. These AOX expression patterns suggest a highly regulated and diverse transcription and expression system. The insights gained provide a framework for the continued and expanded study of AOX genes in wheat for stress tolerance through breeding new varieties, as well as resistance to AOX-targeted herbicides, all of which can ultimately be used synergistically to improve crop yield.
Single Molecule Sequencing: new outlooks for solving genome assembly and transcripts identification challenges
In this review, we introduce a novel sequencing technology, named Single Molecule Real Time sequencing. Also called Single Molecule Sequencing, as it do not requires any amplification, this new technology is able to pro- duce much longer reads than previous NGS technologies such as Illumina. This read size improvements, which can reach 150 fold, will solve many challenges caused by the actual NGS technologies. Short NGS reads, reach- ing a maximum size of 300 bp, make it hard to reconstitute a whole genome and are always leading to fragmented genome assembly. It is also difficult to correctly infer transcript quantification and identification when there is a high isoforms diversity. Despite their higher error rate, long reads have shown very promising result concerning these actual issues. We show that longer reads can produce less fragmented assembly, with a better quality, but also sequence from start to end mRNA, making it much more easier to infer correct transcript quantification, and even allow new intron structure and so new isoforms discovery.
Mixed fibrolamellar hepatocellular carcinoma (mFL-HCC) is a rare liver tumor defined by the presence of both pure FL-HCC and conventional HCC components, represents up to 25% of cases of FL-HCC, and has been associated with worse prognosis. Recent genomic characterization of pure FL-HCC identified a highly recurrent transcript fusion (DNAJB1:PRKACA) not found in conventional HCC.We performed exome and transcriptome sequencing of a case of mFL-HCC. A novel BAC-capture approach was developed to identify a 400 kb deletion as the underlying genomic mechanism for a DNAJB1:PRKACA fusion in this case. A sensitive Nanostring Elements assay was used to screen for this transcript fusion in a second case of mFL-HCC, 112 additional HCC samples and 44 adjacent non-tumor liver samples.We report the first comprehensive genomic analysis of a case of mFL-HCC. No common HCC-associated mutations were identified. The very low mutation rate of this case, large number of mostly single-copy, long-range copy number variants, and high expression of ERBB2 were more consistent with previous reports of pure FL-HCC than conventional HCC. In particular, the DNAJB1:PRKACA fusion transcript specifically associated with pure FL-HCC was detected at very high expression levels. Subsequent analysis revealed the presence of this fusion in all primary and metastatic samples, including those with mixed or conventional HCC pathology. A second case of mFL-HCC confirmed our finding that the fusion was detectable in conventional components. An expanded screen identified a third case of fusion-positive HCC, which upon review, also had both conventional and fibrolamellar features. This screen confirmed the absence of the fusion in all conventional HCC and adjacent non-tumor liver samples.These results indicate that mFL-HCC is similar to pure FL-HCC at the genomic level and the DNAJB1:PRKACA fusion can be used as a diagnostic tool for both pure and mFL-HCC.© The Author 2016. Published by Oxford University Press on behalf of the European Society for Medical Oncology.
The human microbiome plays an important and increasingly recognized role in human health. Studies of the microbiome typically use targeted sequencing of the 16S rRNA gene, whole metagenome shotgun sequencing, or other meta-omic technologies to characterize the microbiome’s composition, activity, and dynamics. Processing, analyzing, and interpreting these data involve numerous computational tools that aim to filter, cluster, annotate, and quantify the obtained data and ultimately provide an accurate and interpretable profile of the microbiome’s taxonomy, functional capacity, and behavior. These tools, however, are often limited in resolution and accuracy and may fail to capture many biologically and clinically relevant microbiome features, such as strain-level variation or nuanced functional response to perturbation. Over the past few years, extensive efforts have been invested toward addressing these challenges and developing novel computational methods for accurate and high-resolution characterization of microbiome data. These methods aim to quantify strain-level composition and variation, detect and characterize rare microbiome species, link specific genes to individual taxa, and more accurately characterize the functional capacity and dynamics of the microbiome. These methods and the ability to produce detailed and precise microbiome information are clearly essential for informing microbiome-based personalized therapies. In this review, we survey these methods, highlighting the challenges each method sets out to address and briefly describing methodological approaches. Copyright © 2016 Elsevier Inc. All rights reserved.
Molecular monitoring plays an essential role in the clinical management of chronic myeloid leukemia (CML) patients, and now guides clinical decision making. Quantitative reverse-transcriptase-polymerase-chain-reaction (qRT-PCR) assessment of BCR-ABL1 transcript levels has become the standard of care protocol in CML. However, further developments are required to assess leukemic burden more efficiently, monitor minimal residual disease (MRD), detect mutations that drive resistance to tyrosine kinase inhibitor (TKI) therapy and identify predictors of response to TKI therapy. Cartridge-based BCR-ABL1 quantitation, digital PCR and next generation sequencing are examples of technologies which are currently being explored, evaluated and translated into the clinic. Here we review the emerging molecular methods/technologies currently being developed to advance molecular monitoring in CML.
Haplotypes are fundamental to fully characterize the diploid genome of an individual, yet methods to directly chart the unique genetic makeup of each parental chromosome are lacking. Here we introduce single-cell DNA template strand sequencing (Strand-seq) as a novel approach to phasing diploid genomes along the entire length of all chromosomes. We demonstrate this by building a complete haplotype for a HapMap individual (NA12878) at high accuracy (concordance 99.3%), without using generational information or statistical inference. By use of this approach, we mapped all meiotic recombination events in a family trio with high resolution (median range ~14 kb) and phased larger structural variants like deletions, indels, and balanced rearrangements like inversions. Lastly, the single-cell resolution of Strand-seq allowed us to observe loss of heterozygosity regions in a small number of cells, a significant advantage for studies of heterogeneous cell populations, such as cancer cells. We conclude that Strand-seq is a unique and powerful approach to completely phase individual genomes and map inheritance patterns in families, while preserving haplotype differences between single cells.© 2016 Porubský et al.; Published by Cold Spring Harbor Laboratory Press.
Using experimental evolution to identify druggable targets that could inhibit the evolution of antimicrobial resistance.
With multi-drug and pan-drug-resistant bacteria becoming increasingly common in hospitals, antibiotic resistance has threatened to return us to a pre-antibiotic era that would completely undermine modern medicine. There is an urgent need to develop new antibiotics and strategies to combat resistance that are substantially different from earlier drug discovery efforts. One such strategy that would complement current and future antibiotics would be a class of co-drugs that target the evolution of resistance and thereby extend the efficacy of specific classes of antibiotics. A critical step in the development of such strategies lies in understanding the critical evolutionary trajectories responsible for resistance and which proteins or biochemical pathways within those trajectories would be good candidates for co-drug discovery. We identify the most important steps in the evolution of resistance for a specific pathogen and antibiotic combination by evolving highly polymorphic populations of pathogens to resistance in a novel bioreactor that favors biofilm development. As the populations evolve to increasing drug concentrations, we use deep sequencing to elucidate the network of genetic changes responsible for resistance and subsequent in vitro biochemistry and often structure determination to determine how the adaptive mutations produce resistance. Importantly, the identification of the molecular steps, their frequency within the populations and their chronology within the evolutionary trajectory toward resistance is critical to assessing their relative importance. In this work, we discuss findings from the evolution of the ESKAPE pathogen, Pseudomonas aeruginosa to the drug of last resort, colistin to illustrate the power of this approach.
The methylation of DNA is important mechanism to control biological processes. Recently, the Pacbio SMRT technology provides a new way to identify base methylation in the genome. MotifMaker is a tool developed by Pacbio for discovering DNA methylation motifs from methylated DNA sequences. However, MotifMaker is single-threaded and computational expensive for identifying methylation motifs from large genomes. Here, we present an efficient motif finding algorithm (MultiMotifMaker) by implementing multi threads of the MotifMaker. The MultiMotifMaker, speeds up the motif search about 8-9 times on a 32 core computer comparing to MotifMaker. MultiMotifMaker makes it possible to identify methylation motifs from Pacbio reads for large genomes.
Report from the Killer-cell Immunoglobulin-like Receptors (KIR) component of the 17th International HLA and Immunogenetics Workshop.
The goals of the KIR component of the 17th International HLA and Immunogenetics Workshop (IHIW) were to encourage and educate researchers to begin analyzing KIR at allelic resolution, and to survey the nature and extent of KIR allelic diversity across human populations. To represent worldwide diversity, we analyzed 1269 individuals from ten populations, focusing on the most polymorphic KIR genes, which express receptors having three immunoglobulin (Ig)-like domains (KIR3DL1/S1, KIR3DL2 and KIR3DL3). We identified 13 novel alleles of KIR3DL1/S1, 13 of KIR3DL2 and 18 of KIR3DL3. Previously identified alleles, corresponding to 33 alleles of KIR3DL1/S1, 38 of KIR3DL2, and 43 of KIR3DL3, represented over 90% of the observed allele frequencies for these genes. In total we observed 37 KIR3DL1/S1 allotypes, 40 for KIR3DL2 and 44 for KIR3DL3. As KIR allotype diversity can affect NK cell function, this demonstrates potential for high functional diversity worldwide. Allelic variation further diversifies KIR haplotypes. We determined KIR3DL3?~?KIR3DL1/S1?~?KIR3DL2 haplotypes from five of the studied populations, and observed multiple population-specific haplotypes in each. This included 234 distinct haplotypes in European Americans, 191 in Ugandans, 35 in Papuans, 95 in Egyptians and 86 in Spanish populations. For another 35 populations, encompassing 642,105 individuals we focused on KIR3DL2 and identified another 375 novel alleles, with approximately half of them observed in more than one individual. The KIR allelic level data gathered from this project represents the most comprehensive summary of global KIR allelic diversity to date, and continued analysis will improve understanding of KIR allelic polymorphism in global populations. Further, the wealth of new data gathered in the course of this workshop component highlights the value of collaborative, community-based efforts in immunogenetics research, exemplified by the IHIW.Copyright © 2018. Published by Elsevier Inc.
Towards Personalized Medicine: An Improved De Novo Assembly Procedure for Early Detection of Drug Resistant HIV Minor Quasispecies in Patient Samples.
The third-generation sequencing technology, PacBio, has shown an ability to sequence the HIV virus amplicons in their full length. The long read of PaBio offers a distinct advantage to comprehensively understand the virus evolution complexity at quasispecies level (i.e. maintaining linkage information of variants) comparing to the short reads from Illumina shotgun sequencing. However, due to the highnoise nature of the PacBio reads, it is still a challenge to build accurate contigs at high sensitivity. Most of previously developed NGS assembly tools work with the assumption that the input reads are fairly accurate, which is largely true for the data derived from Sanger or Illumina technologies. When applying these tools on PacBio high-noise reads, they are largely driven by noise rather than true signal eventually leading to poor results in most cases. In this study, we propose the de novo assembly procedure, which comprises a positivefocused strategy, and linkage-frequency noise reduction so that it is more suitable for PacBio high-noise reads. We further tested the unique de novo assembly procedure on HIV PacBio benchmark data and clinical samples, which accurately assembled dominant and minor populations of HIV quasispecies as expected. The improved de novo assembly procedure shows potential ability to promote PacBio technology in the field of HIV drug-resistance clinical detection, as well as in broad HIV phylogenetic studies.