During the past decade, the search for pathogenic mutations in rare human genetic diseases has involved huge efforts to sequence coding regions, or the entire genome, using massively parallel short-read sequencers. However, the approximate current diagnostic rate is <50% using these approaches, and there remain many rare genetic diseases with unknown cause. There may be many reasons for this, but one plausible explanation is that the responsible mutations are in regions of the genome that are difficult to sequence using conventional technologies (e.g., tandem-repeat expansion or complex chromosomal structural aberrations). Despite the drawbacks of high cost and a shortage of standard analytical methods, several studies have analyzed pathogenic changes in the genome using long-read sequencers. The results of these studies provide hope that further application of long-read sequencers to identify the causative mutations in unsolved genetic diseases may expand our understanding of the human genome and diseases. Such approaches may also be applied to molecular diagnosis and therapeutic strategies for patients with genetic diseases in the future.
Characterization of Reference Materials for Genetic Testing of CYP2D6 Alleles: A GeT-RM Collaborative Project.
Pharmacogenetic testing increasingly is available from clinical and research laboratories. However, only a limited number of quality control and other reference materials currently are available for the complex rearrangements and rare variants that occur in the CYP2D6 gene. To address this need, the Division of Laboratory Systems, CDC-based Genetic Testing Reference Material Coordination Program, in collaboration with members of the pharmacogenetic testing and research communities and the Coriell Cell Repositories (Camden, NJ), has characterized 179 DNA samples derived from Coriell cell lines. Testing included the recharacterization of 137 genomic DNAs that were genotyped in previous Genetic Testing Reference Material Coordination Program studies and 42 additional samples that had not been characterized previously. DNA samples were distributed to volunteer testing laboratories for genotyping using a variety of commercially available and laboratory-developed tests. These publicly available samples will support the quality-assurance and quality-control programs of clinical laboratories performing CYP2D6 testing.Published by Elsevier Inc.
Chemical defense against predators is widespread in natural ecosystems. Occasionally, taxonomically distant organisms share the same defense chemical. Here, we describe an unusual tripartite marine symbiosis, in which an intracellular bacterial symbiont (“Candidatus Endobryopsis kahalalidefaciens”) uses a diverse array of biosynthetic enzymes to convert simple substrates into a library of complex molecules (the kahalalides) for chemical defense of the host, the alga Bryopsis sp., against predation. The kahalalides are subsequently hijacked by a third partner, the herbivorous mollusk Elysia rufescens, and employed similarly for defense. “Ca E. kahalalidefaciens” has lost many essential traits for free living and acts as a factory for kahalalide production. This interaction between a bacterium, an alga, and an animal highlights the importance of chemical defense in the evolution of complex symbioses.Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Identification of Initial Colonizing Bacteria in Dental Plaques from Young Adults Using Full-Length 16S rRNA Gene Sequencing.
Development of dental plaque begins with the adhesion of salivary bacteria to the acquired pellicle covering the tooth surface. In this study, we collected in vivo dental plaque formed on hydroxyapatite disks for 6 h from 74 young adults and identified initial colonizing taxa based on full-length 16S rRNA gene sequences. A long-read, single-molecule sequencer, PacBio Sequel, provided 100,109 high-quality full-length 16S rRNA gene sequence reads from the early plaque microbiota, which were assigned to 90 oral bacterial taxa. The microbiota obtained from every individual mostly comprised the 21 predominant taxa with the maximum relative abundance of over 10% (95.8?±?6.2%, mean ± SD), which included Streptococcus species as well as nonstreptococcal species. A hierarchical cluster analysis of their relative abundance distribution suggested three major patterns of microbiota compositions: a Streptococcus mitis/Streptococcus sp. HMT-423-dominant profile, a Neisseria sicca/Neisseria flava/Neisseria mucosa-dominant profile, and a complex profile with high diversity. No notable variations in the community structures were associated with the dental caries status, although the total bacterial amounts were larger in the subjects with a high number of caries-experienced teeth (=8) than in those with no or a low number of caries-experienced teeth. Our results revealed the bacterial taxa primarily involved in early plaque formation on hydroxyapatite disks in young adults.IMPORTANCE Selective attachment of salivary bacteria to the tooth surface is an initial and repetitive phase in dental plaque development. We employed full-length 16S rRNA gene sequence analysis with a high taxonomic resolution using a third-generation sequencer, PacBio Sequel, to determine the bacterial composition during early plaque formation in 74 young adults accurately and in detail. The results revealed 21 bacterial taxa primarily involved in early plaque formation on hydroxyapatite disks in young adults, which include several streptococcal species as well as nonstreptococcal species, such as Neisseria sicca/Nflava/Nmucosa and Rothia dentocariosa Given that no notable variations in the microbiota composition were associated with the dental caries status, the maturation process, rather than the specific bacterial species that are the initial colonizers, is likely to play an important role in the development of dysbiotic microbiota associated with dental caries. Copyright © 2019 Ihara et al.
The genome sequence of Exophiala lecanii-corni, a melanized dimorphic fungus with the capability of degrading several volatile organic compounds, was sequenced using PacBio single-molecule real-time (SMRT) sequencing to assist with understanding the molecular basis of its uncommon morphological and metabolic characteristics. The assembled draft genome is presented here.
In the past several years, single-molecule sequencing platforms, such as those by Pacific Biosciences and Oxford Nanopore Technologies, have become available to researchers and are currently being tested for clinical applications. They offer exceptionally long reads that permit direct sequencing through regions of the genome inaccessible or difficult to analyze by short-read platforms. This includes disease-causing long repetitive elements, extreme GC content regions, and complex gene loci. Similarly, these platforms enable structural variation characterization at previously unparalleled resolution and direct detection of epigenetic marks in native DNA. Here, we review how these technologies are opening up new clinical avenues that are being applied to pathogenic microorganisms and viruses, constitutional disorders, pharmacogenomics, cancer, and more.Copyright © 2018 Elsevier Ltd. All rights reserved.
A 12-kb structural variation in progressive myoclonic epilepsy was newly identified by long-read whole-genome sequencing.
We report a family with progressive myoclonic epilepsy who underwent whole-exome sequencing but was negative for pathogenic variants. Similar clinical courses of a devastating neurodegenerative phenotype of two affected siblings were highly suggestive of a genetic etiology, which indicates that the survey of genetic variation by whole-exome sequencing was not comprehensive. To investigate the presence of a variant that remained unrecognized by standard genetic testing, PacBio long-read sequencing was performed. Structural variant (SV) detection using low-coverage (6×) whole-genome sequencing called 17,165 SVs (7,216 deletions and 9,949 insertions). Our SV selection narrowed down potential candidates to only five SVs (two deletions and three insertions) on the genes tagged with autosomal recessive phenotypes. Among them, a 12.4-kb deletion involving the CLN6 gene was the top candidate because its homozygous abnormalities cause neuronal ceroid lipofuscinosis. This deletion included the initiation codon and was found in a GC-rich region containing multiple repetitive elements. These results indicate the presence of a causal variant in a difficult-to-sequence region and suggest that such variants that remain enigmatic after the application of current whole-exome sequencing technology could be uncovered by unbiased application of long-read whole-genome sequencing.
DNA Methylation at the Schizophrenia and Intelligence GWAS-Implicated MIR137HG Locus May Be Associated with Disease and Cognitive Functions
The largest genome-wide association studies have identified schizophrenia and intelligence associated variants in the MIR137HG locus containing genes encoding microRNA-137 and microRNA-2682. In the present study, we investigated DNA methylation in the MIR137HG intragenic CpG island (CGI) in the peripheral blood of 44 patients with schizophrenia and 50 healthy controls. The CGI included the entire MIR137 gene and the region adjacent to the 5′-end of MIR2682. The aim of the study was to examine the relationship of the CGI methylation with schizophrenia and cognitive functioning. The methylation level of 91 CpG located in the selected region was established for each participant by means of single-molecule real-time bisulfite sequencing. All subjects completed the battery of neuropsychological tests. We found that the CGI was hypomethylated in both groups, except for one site—CpG (chr1: 98?511?049), with significant interindividual variability in methylation. A higher level of methylation of this CpG was seen in male patients and was associated with a decrease in the cognitive index in the combined sample of patients and controls. Our data suggest that further investigation of mechanisms that regulate the MIR137 and MIR2682 genes expression might help to understand the molecular basis of cognitive deficits in schizophrenia.
Detecting a long insertion variant in SAMD12 by SMRT sequencing: implications of long-read whole-genome sequencing for repeat expansion diseases.
Long-read sequencing technology is now capable of reading single-molecule DNA with an average read length of more than 10?kb, fully enabling the coverage of large structural variations (SVs). This advantage may pave the way for the detection of unprecedented SVs as well as repeat expansions. Pathogenic SVs of only known genes used to be selectively analyzed based on prior knowledge of target DNA sequence. The unbiased application of long-read whole-genome sequencing (WGS) for the detection of pathogenic SVs has just begun. Here, we apply PacBio SMRT sequencing in a Japanese family with benign adult familial myoclonus epilepsy (BAFME). Our SV selection of low-coverage WGS data (7×) narrowed down the candidates to only six SVs in a 7.16-Mb region of the BAFME1 locus and correctly determined an approximately 4.6-kb SAMD12 intronic repeat insertion, which is causal of BAFME1. These results indicate that long-read WGS is potentially useful for evaluating all of the known SVs in a genome and identifying new disease-causing SVs in combination with other genetic methods to resolve the genetic causes of currently unexplained diseases.
Full-length transcriptome sequences obtained by a combination of sequencing platforms applied to heat shock proteins and polyunsaturated fatty acids biosynthesis in Pyropia haitanensis
Pyropia haitanensis is a high-yield commercial seaweed in China. Pyropia haitanensis farms often suffer from problems such as severe germplasm degeneration, while the mechanisms underlying resistance to abiotic stresses remain unknown because of lacking genomic information. Although many previous studies focused on using next-generation sequencing (NGS) technologies, the short-read sequences generated by NGS generally prevent the assembly of full-length transcripts, and then limit screening functional genes. In the present study, which was based on hybrid sequencing (NGS and single-molecular real-time sequencing) of the P. haitanensis thallus transcriptome, we obtained high-quality full-length transcripts with a mean length of 2998 bp and an N50 value of 3366 bp. A total of 14,773 unigenes (93.52%) were annotated in at least one database, while approximately 60% of all unigenes were assembled by short Illumina reads. Moreover, we herein suggested that the genes involved in the biosynthesis of polyunsaturated fatty acids and heat shock proteins play an important role in the process of development and resistance to abiotic stresses in P. haitanensis. The present study, together with previously published ones, may facilitate seaweed transcriptome research.
A physical and genetic map of Cannabis sativa identifies extensive rearrangements at the THC/CBD acid synthase loci.
Cannabis sativa is widely cultivated for medicinal, food, industrial, and recreational use, but much remains unknown regarding its genetics, including the molecular determinants of cannabinoid content. Here, we describe a combined physical and genetic map derived from a cross between the drug-type strain Purple Kush and the hemp variety “Finola.” The map reveals that cannabinoid biosynthesis genes are generally unlinked but that aromatic prenyltransferase (AP), which produces the substrate for THCA and CBDA synthases (THCAS and CBDAS), is tightly linked to a known marker for total cannabinoid content. We further identify the gene encoding CBCA synthase (CBCAS) and characterize its catalytic activity, providing insight into how cannabinoid diversity arises in cannabis. THCAS and CBDAS (which determine the drug vs. hemp chemotype) are contained within large (>250 kb) retrotransposon-rich regions that are highly nonhomologous between drug- and hemp-type alleles and are furthermore embedded within ~40 Mb of minimally recombining repetitive DNA. The chromosome structures are similar to those in grains such as wheat, with recombination focused in gene-rich, repeat-depleted regions near chromosome ends. The physical and genetic map should facilitate further dissection of genetic and molecular mechanisms in this commercially and medically important plant. © 2019 Laverty et al.; Published by Cold Spring Harbor Laboratory Press.
Analysis of the Complete Genome Sequence of a Novel, Pseudorabies Virus Strain Isolated in Southeast Europe.
Pseudorabies virus (PRV) is the causative agent of Aujeszky’s disease giving rise to significant economic losses worldwide. Many countries have implemented national programs for the eradication of this virus. In this study, long-read sequencing was used to determine the nucleotide sequence of the genome of a novel PRV strain (PRV-MdBio) isolated in Serbia.In this study, a novel PRV strain was isolated and characterized. PRV-MdBio was found to exhibit similar growth properties to those of another wild-type PRV, the strain Kaplan. Single-molecule real-time (SMRT) sequencing has revealed that the new strain differs significantly in base composition even from strain Kaplan, to which it otherwise exhibits the highest similarity. We compared the genetic composition of PRV-MdBio to strain Kaplan and the China reference strain Ea and obtained that radical base replacements were the most common point mutations preceding conservative and silent mutations. We also found that the adaptation of PRV to cell culture does not lead to any tendentious genetic alteration in the viral genome.PRV-MdBio is a wild-type virus, which differs in base composition from other PRV strains to a relatively large extent.
A Pathovar of Xanthomonas oryzae Infecting Wild Grasses Provides Insight Into the Evolution of Pathogenicity in Rice Agroecosystems
Xanthomonas oryzae (Xo) are critical rice pathogens. Virulent lineages from Africa and Asia and less virulent strains from the US have been well characterized. X. campestris pv. leersiae (Xcl), first described in 1957, causes bacterial streak on the perennial grass, Leersia hexandra, and is a close relative of Xo. L. hexandra, a member of the Poaceae, is highly similar to rice phylogenetically, is globally ubiquitous around rice paddies, and is a reservoir of pathogenic Xo. We used long read, single molecule, real time (SMRT) genome sequences of five strains of Xcl from Burkina Faso, China, Mali and Uganda to determine the genetic relatedness of this organism with Xo. Novel Transcription Activator-Like Effectors (TALEs) were discovered in all five strains of Xcl. Predicted TALE target sequences were identified in the L. perrieri genome and compared to rice susceptibility gene homologs. Pathogenicity screening on L. hexandra and diverse rice cultivars confirmed that Xcl are able to colonize rice and produce weak but not progressive symptoms. Overall, based on average nucleotide identity, type III effector repertoires and disease phenotype, we propose to rename Xcl to X. oryzae pv. leersiae (Xol) and use this parallel system to improve understanding of the evolution of bacterial pathogenicity in rice agroecosystems.
The wide implementation of next-generation sequencing (NGS) technologies has revolutionized the field of medical genetics. However, the short read lengths of currently used sequencing approaches pose a limitation for identification of structural variants, sequencing repetitive regions, phasing alleles and distinguishing highly homologous genomic regions. These limitations may significantly contribute to the diagnostic gap in patients with genetic disorders who have undergone standard NGS, like whole exome or even genome sequencing. Now, the emerging long-read sequencing (LRS) technologies may offer improvements in the characterization of genetic variation and regions that are difficult to assess with the currently prevailing NGS approaches. LRS has so far mainly been used to investigate genetic disorders with previously known or strongly suspected disease loci. While these targeted approaches already show the potential of LRS, it remains to be seen whether LRS technologies can soon enable true whole genome sequencing routinely. Ultimately, this could allow the de novo assembly of individual whole genomes used as a generic test for genetic disorders. In this article, we summarize the current LRS-based research on human genetic disorders and discuss the potential of these technologies to facilitate the next major advancements in medical genetics.