Despite amazing progress over the past quarter century in the technology to detect genetic variants, intermediate-sized structural variants (50 bp to 50 kb) have remained difficult to identify. Such variants are too small to detect with array comparative genomic hybridization, but too large to reliably discover with short-read DNA sequencing. Recent de novo assemblies of human genomes have demonstrated the power of PacBio Single Molecule, Real-Time (SMRT) Sequencing to fill this technology gap and sensitively identify structural variants in the human genome. While de novo assembly is the ideal method to identify variants in a genome, it requires high depth of coverage. A structural variant discovery approach that utilizes lower coverage would facilitate evaluation of large patient and population cohorts. Here we introduce such an approach and apply it to 10-fold coverage of several human genomes generated on the PacBio Sequel System. To identify structural variants in low-fold coverage whole genome sequencing data, we apply a reference-based, re-sequencing workflow. First, reads are mapped to the human reference genome with a local aligner. The local alignments often end at structural variant loci. To connect co-linear local alignments across structural variants, we apply a novel algorithm that merges alignments into “chains” and refines the alignment edges. Then, the chained alignments are scanned for windows with an excess of insertions or deletions to identify candidate structural variant loci. Finally, the read support at each putative variant locus is evaluated to produce a variant call. Single nucleotide information is incorporated to phase and evaluate the zygosity of each structural variant. In 10-fold coverage human genome sequence, we identify the vast majority of the structural variants found by de novo assembly, thus demonstrating the power of low-fold coverage SMRT Sequencing to affordably and effectively detect structural variants.
Though a role for structural variants in human disease has long been recognized, it has remained difficult to identify intermediate-sized variants (50 bp to 5 kb), which are too small to detect with array comparative genomic hybridization, but too large to reliably discover with short-read DNA sequencing. Recent studies have demonstrated that PacBio Single Molecule, Real-Time (SMRT) sequencing fills this technology gap. SMRT sequencing detects tens of thousands of structural variants in the human genome, approximately five times the sensitivity of short-read DNA sequencing.
Structural variants (genomic differences =50 base pairs) contribute to the evolution of organisms traits and human disease. Most structural variants (SVs) are too small to detect with array comparative genomic hybridization but too large to reliably discover with short-read DNA sequencing. Recent studies in human genomes show that PacBio SMRT Sequencing sensitively detects structural variants.
Most of the base pairs that differ between two human genomes are in intermediate-sized structural variants (50 bp to 5 kb), which are too small to detect with array comparative genomic hybridization or optical mapping but too large to reliably discover with short-read DNA sequencing. Long-read sequencing with PacBio Single Molecule, Real-Time (SMRT) Sequencing platforms fills this technology gap. PacBio SMRT Sequencing detects tens of thousands of structural variants in a human genome with approximately five times the sensitivity of short-read DNA sequencing. Effective application of PacBio SMRT Sequencing to detect structural variants requires quality bioinformatics tools that account for the characteristics of PacBio reads. To provide such a solution, we developed pbsv, a structural variant caller for PacBio reads that works as a chain of simple stages: 1) map reads to the reference genome, 2) identify reads with signatures of structural variation, 3) cluster nearby reads with similar signatures, 4) summarize each cluster into a consensus variant, and 5) filter for variants with sufficient read support. To evaluate the baseline performance of pbsv, we generated high coverage of a diploid human genome on the PacBio Sequel System, established a target set of structural variants, and then titrated to lower coverage levels. The false discovery rate for pbsv is low at all coverage levels. Sensitivity is high even at modest coverage: above 85% at 10-fold coverage and above 95% at 20-fold coverage. To assess the potential for PacBio SMRT Sequencing to identify pathogenic variants, we evaluated an individual with clinical symptoms suggestive of Carney complex for whom short-read whole genome sequencing was uninformative. The individual was sequenced to 9-fold coverage on the PacBio Sequel System, and structural variants were called with pbsv. Filtering for rare, genic structural variants left six candidates, including a heterozygous 2,184 bp deletion that removes the first coding exon of PRKAR1A. Null mutations in PRKAR1Acause autosomal dominant Carney complex, type 1. The variant was determined to be de novo, and it was classified as likely pathogenic based on ACMG standards and guidelines for variant interpretation. These case studies demonstrate the ability of pbsv to detect structural variants in low-coverage PacBio SMRT Sequencing and suggest the importance of considering structural variants in any study of human genetic variation.
Structural variants (genomic differences =50 base pairs) contribute to the evolution of traits and disease. Most structural variants (SVs) are too small to detect with array comparative genomic hybridization and too large to reliably discover with short-read DNA sequencing.
Genomics studies have shown that the insertions, deletions, duplications, translocations, inversions, and tandem repeat expansions in the structural variant (SV) size range (>50 bp) contribute to the evolution of traits and often have significant associations with agronomically important phenotypes. However, most SVs are too small to detect with array comparative genomic hybridization and too large to reliably discover with short-read DNA sequencing. While de novo assembly is the most comprehensive way to identify variants in a genome, recent studies in human genomes show that PacBio SMRT Sequencing sensitively detects structural variants at low coverage. Here we present SV characterization in the major crop species Oryza sativa subsp. indica (rice) with low-fold coverage of long reads. In addition, we provide recommendations for sequencing and analysis for the application of this workflow to other important agricultural species.
PacBio SMRT Sequencing is fast changing the genomics space with its long reads and high consensus sequence accuracy, providing the most comprehensive view of the genome and transcriptome. In this…
Tracking short-term changes in the genetic diversity and antimicrobial resistance of OXA-232-producing Klebsiella pneumoniae ST14 in clinical settings.
To track stepwise changes in genetic diversity and antimicrobial resistance in rapidly evolving OXA-232-producing Klebsiella pneumoniae ST14, an emerging carbapenem-resistant high-risk clone, in clinical settings.Twenty-six K. pneumoniae ST14 isolates were collected by the Korean Nationwide Surveillance of Antimicrobial Resistance system over the course of 1 year. Isolates were subjected to whole-genome sequencing and MIC determinations using 33 antibiotics from 14 classes.Single-nucleotide polymorphism (SNP) typing identified 72 unique SNP sites spanning the chromosomes of the isolates, dividing them into three clusters (I, II and III). The initial isolate possessed two plasmids with 18 antibiotic-resistance genes, including blaOXA-232, and exhibited resistance to 11 antibiotic classes. Four other plasmids containing 12 different resistance genes, including blaCTX-M-15 and strA/B, were introduced over time, providing additional resistance to aztreonam and streptomycin. Moreover, chromosomal integration of insertion sequence Ecp1-blaCTX-M-15 mediated the inactivation of mgrB responsible for colistin resistance in four isolates from cluster III. To the best of our knowledge, this is the first description of K. pneumoniae ST14 resistant to both carbapenem and colistin in South Korea. Furthermore, although some acquired genes were lost over time, the retention of 12 resistance genes and inactivation of mgrB provided resistance to 13 classes of antibiotics.We describe stepwise changes in OXA-232-producing K. pneumoniae ST14 in vivo over time in terms of antimicrobial resistance. Our findings contribute to our understanding of the evolution of emerging high-risk K. pneumoniae clones and provide reference data for future outbreaks.Copyright © 2019 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Comparative genomics reveals unique wood-decay strategies and fruiting body development in the Schizophyllaceae.
Agaricomycetes are fruiting body-forming fungi that produce some of the most efficient enzyme systems to degrade wood. Despite decades-long interest in their biology, the evolution and functional diversity of both wood-decay and fruiting body formation are incompletely known. We performed comparative genomic and transcriptomic analyses of wood-decay and fruiting body development in Auriculariopsis ampla and Schizophyllum commune (Schizophyllaceae), species with secondarily simplified morphologies, an enigmatic wood-decay strategy and weak pathogenicity to woody plants. The plant cell wall-degrading enzyme repertoires of Schizophyllaceae are transitional between those of white rot species and less efficient wood-degraders such as brown rot or mycorrhizal fungi. Rich repertoires of suberinase and tannase genes were found in both species, with tannases restricted to Agaricomycetes that preferentially colonize bark-covered wood, suggesting potential complementation of their weaker wood-decaying abilities and adaptations to wood colonization through the bark. Fruiting body transcriptomes revealed a high rate of divergence in developmental gene expression, but also several genes with conserved expression patterns, including novel transcription factors and small-secreted proteins, some of the latter which might represent fruiting body effectors. Taken together, our analyses highlighted novel aspects of wood-decay and fruiting body development in an important family of mushroom-forming fungi. © 2019 The Authors. New Phytologist © 2019 New Phytologist Trust.
Updated assembly resource of Phytophthora ramorum Pr102 isolate incorporating long reads from PacBio sequencing.
The NA1 clonal lineage of Phytophthora ramorum is responsible for Sudden Oak Death, an epidemic that has devastated California’s coastal forest ecosystems. An NA1 isolate Pr102 derived from coast live oak in California was previously sequenced and reported with 65 Mb assembly containing 12 Mb gaps in 2576 scaffolds. Here we report an improved 70 Mb genome in 1512 scaffolds with 6752 bp gaps after incorporating PacBio P5-C3 longreads. This assembly contains 19494 gene models (average gene length 2515 bp) compared to 16134 genes (average gene length of 1673 bp) in the previous version. We predicted 29 new RXLRs and 76 new paralogs of a total 392 RXLRs from this assembly. We predicted 35 CRNs compared to 19 in earlier version with six paralogs. Our lncRNAs prediction identified 255 candidates. This new resource will be invaluable for future evolution studies on the invasive plant pathogen.
Comparative Genomic Analysis of Virulence, Antimicrobial Resistance, and Plasmid Profiles of Salmonella Dublin Isolated from Sick Cattle, Retail Beef, and Humans in the United States.
Salmonella enterica serovar Dublin is a host-adapted serotype associated with typhoidal disease in cattle. While rare in humans, it usually causes severe illness, including bacteremia. In the United States, Salmonella Dublin has become one of the most multidrug-resistant (MDR) serotypes. To understand the genetic elements that are associated with virulence and resistance, we sequenced 61 isolates of Salmonella Dublin (49 from sick cattle and 12 from retail beef) using the Illumina MiSeq and closed 5 genomes using the PacBio sequencing platform. Genomic data of eight human isolates were also downloaded from NCBI (National Center for Biotechnology Information) for comparative analysis. Fifteen Salmonella pathogenicity islands (SPIs) and a spv operon (spvRABCD), which encodes important virulence factors, were identified in all 69 (100%) isolates. The 15 SPIs were located on the chromosome of the 5 closed genomes, with each of these isolates also carrying 1 or 2 plasmids with sizes between 36 and 329?kb. Multiple antimicrobial resistance genes (ARGs), including blaCMY-2, blaTEM-1B, aadA12, aph(3′)-Ia, aph(3′)-Ic, strA, strB, floR, sul1, sul2, and tet(A), along with spv operons were identified on these plasmids. Comprehensive antimicrobial resistance genotypes were determined, including 17 genes encoding resistance to 5 different classes of antimicrobials, and mutations in the housekeeping gene (gyrA) associated with resistance or decreased susceptibility to fluoroquinolones. Together these data revealed that this panel of Salmonella Dublin commonly carried 15 SPIs, MDR/virulence plasmids, and ARGs against several classes of antimicrobials. Such genomic elements may make important contributions to the severity of disease and treatment failures in Salmonella Dublin infections in both humans and cattle.
Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.
Insights into the bacterial species and communities of a full-scale anaerobic/anoxic/oxic wastewater treatment plant by using third-generation sequencing.
For the first time, full-length 16S rRNA sequencing method was applied to disclose the bacterial species and communities of a full-scale wastewater treatment plant using an anaerobic/anoxic/oxic (A/A/O) process in Wuhan, China. The compositions of the bacteria at phylum and class levels in the activated sludge were similar to which revealed by Illumina Miseq sequencing. At genus and species levels, third-generation sequencing showed great merits and accuracy. Typical functional taxa classified to ammonia-oxidizing bacteria (AOB), nitrite-oxidizing bacteria (NOB), denitrifying bacteria (DB), anaerobic ammonium oxidation bacteria (ANAMMOXB) and polyphosphate-accumulating organisms (PAOs) were presented, which were Nitrosomonas (1.11%), Nitrospira (3.56%), Pseudomonas (3.88%), Planctomycetes (13.80%), Comamonadaceae (1.83%), respectively. Pseudomonas (3.88%) and Nitrospira (3.56%) were the most predominating two genera, mainly containing Pseudomonas extremaustralis (1.69%), Nitrospira defluvii (3.13%), respectively. Bacteria regarding to nitrogen and phosphorus removal at species level were put forward. The predicted functions proved that the A/A/O process was efficient regarding nitrogen and organics removal. Copyright © 2019 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Listeria monocytogenes is an opportunistic human foodborne pathogen that causes severe infections with high hospitalization and fatality rates. Clonal complex 9 (CC9) contains a large number of sequence types (STs) and is one of the predominant clones distributed worldwide. However, genetic characteristics of ST477 isolates, which also belong to CC9, have never been examined, and little is known about the detail genomic traits of this food-associated clone. In this study, we sequenced and constructed the whole-genome sequence of an ST477 isolate from a frozen food sample in China and compared it with 58 previously sequenced genomes of 25 human-associated, 5 animal, and 27 food isolates consisting of 6 CC9 and 52 other clones. Phylogenetic analysis revealed that the ST477 clustered with three Canadian ST9 isolates. All phylogeny revealed that CC9 isolates involved in this study consistently possessed the invasion-related gene vip. Mobile genetic elements (MGEs), resistance genes, and clustered regularly interspaced short palindromic repeats (CRISPR)/Cas system were elucidated among CC9 isolates. Our ST477 isolate contained a Tn554-like transposon, carrying five arsenical-resistance genes (arsA-arsD, arsR), which was exclusively identified in the CC9 background. Compared with the ST477 genome, three Canadian ST9 isolates shared nonsynonymous nucleotide substitutions in the condensin complex gene smc and cell surface protein genes ftsA and essC. Our findings preliminarily indicate that the extraordinary success of CC9 clone in colonization of different geographical regions is likely due to conserved features harboring MGEs, functional virulence and resistance genes. ST477 and three ST9 genomes are closely related and the distinct differences between them consist primarily of changes in genes involved in multiplication and invasion, which may contribute to the prevalence of ST9 isolates in food and food processing environment.
In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a brief introduction to the basic concepts of genome assembly, annotation and metrics for evaluating the quality of draft assemblies. We then provide an overview of genome information for numerous insect species, highlighting examples from prominent model organisms, agricultural pests and disease vectors. We also introduce the major insect genome databases. The increasing availability of insect genomic resources is beneficial for developing alternative pest control methods. However, many opportunities remain for developing data-mining tools that make maximal use of the available insect genome resources. Although rapid progress has been achieved, many challenges remain in the field of insect genomics. © 2019 The Royal Entomological Society.