During the past decade, the search for pathogenic mutations in rare human genetic diseases has involved huge efforts to sequence coding regions, or the entire genome, using massively parallel short-read sequencers. However, the approximate current diagnostic rate is <50% using these approaches, and there remain many rare genetic diseases with unknown cause. There may be many reasons for this, but one plausible explanation is that the responsible mutations are in regions of the genome that are difficult to sequence using conventional technologies (e.g., tandem-repeat expansion or complex chromosomal structural aberrations). Despite the drawbacks of high cost and a shortage of standard analytical methods, several studies have analyzed pathogenic changes in the genome using long-read sequencers. The results of these studies provide hope that further application of long-read sequencers to identify the causative mutations in unsolved genetic diseases may expand our understanding of the human genome and diseases. Such approaches may also be applied to molecular diagnosis and therapeutic strategies for patients with genetic diseases in the future.
Tracking short-term changes in the genetic diversity and antimicrobial resistance of OXA-232-producing Klebsiella pneumoniae ST14 in clinical settings.
To track stepwise changes in genetic diversity and antimicrobial resistance in rapidly evolving OXA-232-producing Klebsiella pneumoniae ST14, an emerging carbapenem-resistant high-risk clone, in clinical settings.Twenty-six K. pneumoniae ST14 isolates were collected by the Korean Nationwide Surveillance of Antimicrobial Resistance system over the course of 1 year. Isolates were subjected to whole-genome sequencing and MIC determinations using 33 antibiotics from 14 classes.Single-nucleotide polymorphism (SNP) typing identified 72 unique SNP sites spanning the chromosomes of the isolates, dividing them into three clusters (I, II and III). The initial isolate possessed two plasmids with 18 antibiotic-resistance genes, including blaOXA-232, and exhibited resistance to 11 antibiotic classes. Four other plasmids containing 12 different resistance genes, including blaCTX-M-15 and strA/B, were introduced over time, providing additional resistance to aztreonam and streptomycin. Moreover, chromosomal integration of insertion sequence Ecp1-blaCTX-M-15 mediated the inactivation of mgrB responsible for colistin resistance in four isolates from cluster III. To the best of our knowledge, this is the first description of K. pneumoniae ST14 resistant to both carbapenem and colistin in South Korea. Furthermore, although some acquired genes were lost over time, the retention of 12 resistance genes and inactivation of mgrB provided resistance to 13 classes of antibiotics.We describe stepwise changes in OXA-232-producing K. pneumoniae ST14 in vivo over time in terms of antimicrobial resistance. Our findings contribute to our understanding of the evolution of emerging high-risk K. pneumoniae clones and provide reference data for future outbreaks.Copyright © 2019 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Comparative genomics reveals unique wood-decay strategies and fruiting body development in the Schizophyllaceae.
Agaricomycetes are fruiting body-forming fungi that produce some of the most efficient enzyme systems to degrade wood. Despite decades-long interest in their biology, the evolution and functional diversity of both wood-decay and fruiting body formation are incompletely known. We performed comparative genomic and transcriptomic analyses of wood-decay and fruiting body development in Auriculariopsis ampla and Schizophyllum commune (Schizophyllaceae), species with secondarily simplified morphologies, an enigmatic wood-decay strategy and weak pathogenicity to woody plants. The plant cell wall-degrading enzyme repertoires of Schizophyllaceae are transitional between those of white rot species and less efficient wood-degraders such as brown rot or mycorrhizal fungi. Rich repertoires of suberinase and tannase genes were found in both species, with tannases restricted to Agaricomycetes that preferentially colonize bark-covered wood, suggesting potential complementation of their weaker wood-decaying abilities and adaptations to wood colonization through the bark. Fruiting body transcriptomes revealed a high rate of divergence in developmental gene expression, but also several genes with conserved expression patterns, including novel transcription factors and small-secreted proteins, some of the latter which might represent fruiting body effectors. Taken together, our analyses highlighted novel aspects of wood-decay and fruiting body development in an important family of mushroom-forming fungi. © 2019 The Authors. New Phytologist © 2019 New Phytologist Trust.
In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a brief introduction to the basic concepts of genome assembly, annotation and metrics for evaluating the quality of draft assemblies. We then provide an overview of genome information for numerous insect species, highlighting examples from prominent model organisms, agricultural pests and disease vectors. We also introduce the major insect genome databases. The increasing availability of insect genomic resources is beneficial for developing alternative pest control methods. However, many opportunities remain for developing data-mining tools that make maximal use of the available insect genome resources. Although rapid progress has been achieved, many challenges remain in the field of insect genomics. © 2019 The Royal Entomological Society.
Satellite repeats are a structural component of centromeres and telomeres, and in some instances their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50?bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently shared among species, which formed two groups: (1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and (2) subtelomeric 32-mers involved in telomeric metabolism. Using the densities of abundant repeats, individuals could be classified into species. However clustering did not reproduce the accepted species phylogeny, suggesting rapid repeat evolution. Several abundant repeats were enriched in males vs. females; using Y chromosome assemblies or FIuorescent In Situ Hybridization, we validated their location on the Y. Finally, applying a novel computational tool, we identified many satellite repeats completely embedded within long Oxford Nanopore and Pacific Biosciences reads. Such repeats were up to 59?kb in length and consisted of perfect repeats interspersed with other similar sequences. Our results based on sequencing reads generated with three different technologies provide the first detailed characterization of great ape satellite repeats, and open new avenues for exploring their functions. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
African cichlid fishes are well known for their rapid radiations and are a model system for studying evolutionary processes. Here we compare multiple, high-quality, chromosome-scale genome assemblies to elucidate the genetic mechanisms underlying cichlid diversification and study how genome structure evolves in rapidly radiating lineages.We re-anchored our recent assembly of the Nile tilapia (Oreochromis niloticus) genome using a new high-density genetic map. We also developed a new de novo genome assembly of the Lake Malawi cichlid, Metriaclima zebra, using high-coverage Pacific Biosciences sequencing, and anchored contigs to linkage groups (LGs) using 4 different genetic maps. These new anchored assemblies allow the first chromosome-scale comparisons of African cichlid genomes. Large intra-chromosomal structural differences (~2-28 megabase pairs) among species are common, while inter-chromosomal differences are rare (<10 megabase pairs total). Placement of the centromeres within the chromosome-scale assemblies identifies large structural differences that explain many of the karyotype differences among species. Structural differences are also associated with unique patterns of recombination on sex chromosomes. Structural differences on LG9, LG11, and LG20 are associated with reduced recombination, indicative of inversions between the rock- and sand-dwelling clades of Lake Malawi cichlids. M. zebra has a larger number of recent transposable element insertions compared with O. niloticus, suggesting that several transposable element families have a higher rate of insertion in the haplochromine cichlid lineage.This study identifies novel structural variation among East African cichlid genomes and provides a new set of genomic resources to support research on the mechanisms driving cichlid adaptation and speciation. © The Author(s) 2019. Published by Oxford University Press.
In the past several years, single-molecule sequencing platforms, such as those by Pacific Biosciences and Oxford Nanopore Technologies, have become available to researchers and are currently being tested for clinical applications. They offer exceptionally long reads that permit direct sequencing through regions of the genome inaccessible or difficult to analyze by short-read platforms. This includes disease-causing long repetitive elements, extreme GC content regions, and complex gene loci. Similarly, these platforms enable structural variation characterization at previously unparalleled resolution and direct detection of epigenetic marks in native DNA. Here, we review how these technologies are opening up new clinical avenues that are being applied to pathogenic microorganisms and viruses, constitutional disorders, pharmacogenomics, cancer, and more.Copyright © 2018 Elsevier Ltd. All rights reserved.
Full-length transcriptome analysis of Litopenaeus vannamei reveals transcript variants involved in the innate immune system.
To better understand the immune system of shrimp, this study combined PacBio isoform sequencing (Iso-Seq) and Illumina paired-end short reads sequencing methods to discover full-length immune-related molecules of the Pacific white shrimp, Litopenaeus vannamei. A total of 72,648 nonredundant full-length transcripts (unigenes) were generated with an average length of 2545 bp from five main tissues, including the hepatopancreas, cardiac stomach, heart, muscle, and pyloric stomach. These unigenes exhibited a high annotation rate (62,164, 85.57%) when compared against NR, NT, Swiss-Prot, Pfam, GO, KEGG and COG databases. A total of 7544 putative long noncoding RNAs (lncRNAs) were detected and 1164 nonredundant full-length transcripts (449 UniTransModels) participated in the alternative splicing (AS) events. Importantly, a total of 5279 nonredundant full-length unigenes were successfully identified, which were involved in the innate immune system, including 9 immune-related processes, 19 immune-related pathways and 10 other immune-related systems. We also found wide transcript variants, which increased the number and function complexity of immune molecules; for example, toll-like receptors (TLRs) and interferon regulatory factors (IRFs). The 480 differentially expressed genes (DEGs) were significantly higher or tissue-specific expression patterns in the hepatopancreas compared with that in other four tested tissues (FDR <0.05). Furthermore, the expression levels of six selected immune-related DEGs and putative IRFs were validated using real-time PCR technology, substantiating the reliability of the PacBio Iso-seq results. In conclusion, our results provide new genetic resources of long-read full-length transcripts data and information for identifying immune-related genes, which are an invaluable transcriptomic resource as genomic reference, especially for further exploration of the innate immune and defense mechanisms of shrimp. Copyright © 2019 Elsevier Ltd. All rights reserved.
Detecting a long insertion variant in SAMD12 by SMRT sequencing: implications of long-read whole-genome sequencing for repeat expansion diseases.
Long-read sequencing technology is now capable of reading single-molecule DNA with an average read length of more than 10?kb, fully enabling the coverage of large structural variations (SVs). This advantage may pave the way for the detection of unprecedented SVs as well as repeat expansions. Pathogenic SVs of only known genes used to be selectively analyzed based on prior knowledge of target DNA sequence. The unbiased application of long-read whole-genome sequencing (WGS) for the detection of pathogenic SVs has just begun. Here, we apply PacBio SMRT sequencing in a Japanese family with benign adult familial myoclonus epilepsy (BAFME). Our SV selection of low-coverage WGS data (7×) narrowed down the candidates to only six SVs in a 7.16-Mb region of the BAFME1 locus and correctly determined an approximately 4.6-kb SAMD12 intronic repeat insertion, which is causal of BAFME1. These results indicate that long-read WGS is potentially useful for evaluating all of the known SVs in a genome and identifying new disease-causing SVs in combination with other genetic methods to resolve the genetic causes of currently unexplained diseases.
Scylla paramamosain is an important aquaculture crab, which has great economical and nutritional value. To the best of our knowledge, few full-length crab transcriptomes are available. In this study, a library composed of 12 different tissues including gill, hepatopancreas, muscle, cerebral ganglion, eyestalk, thoracic ganglia, intestine, heart, testis, ovary, sperm reservoir, and hemocyte was constructed and sequenced using Pacific Biosciences single-molecule real-time (SMRT) long-read sequencing technology. A total of 284803 full-length non-chimeric reads were obtained, from which 79005 high-quality unique transcripts were obtained after error correction and sequence clustering and redundant. Additionally, a total of 52544 transcripts were annotated against protein database (NCBI nonredundant, Swiss-Prot, KOG, and KEGG database). A total of 23644 long non-coding RNAs (lncRNAs) and 131561 simple sequence repeats (SSRs) were identified. Meanwhile, the isoforms of many genes were also identified in this study. Our study provides a rich set of full-length cDNA sequences for S. paramamosain, which will greatly facilitate S. paramamosain research.
In-Depth Genomic and Phenotypic Characterization of the Antarctic Psychrotolerant Strain Pseudomonas sp. MPC6 Reveals Unique Metabolic Features, Plasticity, and Biotechnological Potential.
We obtained the complete genome sequence of the psychrotolerant extremophile Pseudomonas sp. MPC6, a natural Polyhydroxyalkanoates (PHAs) producing bacterium able to rapidly grow at low temperatures. Genomic and phenotypic analyses allowed us to situate this isolate inside the Pseudomonas fluorescens phylogroup of pseudomonads as well as to reveal its metabolic versatility and plasticity. The isolate possesses the gene machinery for metabolizing a variety of toxic aromatic compounds such as toluene, phenol, chloroaromatics, and TNT. In addition, it can use both C6- and C5-carbon sugars like xylose and arabinose as carbon substrates, an uncommon feature for bacteria of this genus. Furthermore, Pseudomonas sp. MPC6 exhibits a high-copy number of genes encoding for enzymes involved in oxidative and cold-stress response that allows it to cope with high concentrations of heavy metals (As, Cd, Cu) and low temperatures, a finding that was further validated experimentally. We then assessed the growth performance of MPC6 on glycerol using a temperature range from 0 to 45°C, the latter temperature corresponding to the limit at which this Antarctic isolate was no longer able to propagate. On the other hand, the MPC6 genome comprised considerably less virulence and drug resistance factors as compared to pathogenic Pseudomonas strains, thus supporting its safety. Unexpectedly, we found five PHA synthases within the genome of MPC6, one of which clustered separately from the other four. This PHA synthase shared only 40% sequence identity at the amino acid level against the only PHA polymerase described for Pseudomonas (63-1 strain) able to produce copolymers of short- and medium-chain length PHAs. Batch cultures for PHA synthesis in Pseudomonas sp. MPC6 using sugars, decanoate, ethylene glycol, and organic acids as carbon substrates result in biopolymers with different monomer compositions. This indicates that the PHA synthases play a critical role in defining not only the final chemical structure of the biosynthesized PHA, but also the employed biosynthetic pathways. Based on the results obtained, we conclude that Pseudomonas sp. MPC6 can be exploited as a bioremediator and biopolymer factory, as well as a model strain to unveil molecular mechanisms behind adaptation to cold and extreme environments.
The wide implementation of next-generation sequencing (NGS) technologies has revolutionized the field of medical genetics. However, the short read lengths of currently used sequencing approaches pose a limitation for identification of structural variants, sequencing repetitive regions, phasing alleles and distinguishing highly homologous genomic regions. These limitations may significantly contribute to the diagnostic gap in patients with genetic disorders who have undergone standard NGS, like whole exome or even genome sequencing. Now, the emerging long-read sequencing (LRS) technologies may offer improvements in the characterization of genetic variation and regions that are difficult to assess with the currently prevailing NGS approaches. LRS has so far mainly been used to investigate genetic disorders with previously known or strongly suspected disease loci. While these targeted approaches already show the potential of LRS, it remains to be seen whether LRS technologies can soon enable true whole genome sequencing routinely. Ultimately, this could allow the de novo assembly of individual whole genomes used as a generic test for genetic disorders. In this article, we summarize the current LRS-based research on human genetic disorders and discuss the potential of these technologies to facilitate the next major advancements in medical genetics.
Tandemly repeated DNA is highly mutable and causes at least 31 diseases, but it is hard to detect pathogenic repeat expansions genome-wide. Here, we report robust detection of human repeat expansions from careful alignments of long but error-prone (PacBio and nanopore) reads to a reference genome. Our method is robust to systematic sequencing errors, inexact repeats with fuzzy boundaries, and low sequencing coverage. By comparing to healthy controls, we prioritize pathogenic expansions within the top 10 out of 700,000 tandem repeats in whole genome sequencing data. This may help to elucidate the many genetic diseases whose causes remain unknown.