During the past decade, the search for pathogenic mutations in rare human genetic diseases has involved huge efforts to sequence coding regions, or the entire genome, using massively parallel short-read sequencers. However, the approximate current diagnostic rate is <50% using these approaches, and there remain many rare genetic diseases with unknown cause. There may be many reasons for this, but one plausible explanation is that the responsible mutations are in regions of the genome that are difficult to sequence using conventional technologies (e.g., tandem-repeat expansion or complex chromosomal structural aberrations). Despite the drawbacks of high cost and a shortage of standard analytical methods, several studies have analyzed pathogenic changes in the genome using long-read sequencers. The results of these studies provide hope that further application of long-read sequencers to identify the causative mutations in unsolved genetic diseases may expand our understanding of the human genome and diseases. Such approaches may also be applied to molecular diagnosis and therapeutic strategies for patients with genetic diseases in the future.
Natural product drug discovery in the genomic era: realities, conjectures, misconceptions, and opportunities.
Natural product discovery from microorganisms provided important sources for antibiotics, anti-cancer agents, immune-modulators, anthelminthic agents, and insecticides during a span of 50 years starting in the 1940s, then became less productive because of rediscovery issues, low throughput, and lack of relevant new technologies to unveil less abundant or not easily detected drug-like natural products. In the early 2000s, it was observed from genome sequencing that Streptomyces species encode about ten times as many secondary metabolites as predicted from known secondary metabolomes. This gave rise to a new discovery approach-microbial genome mining. As the cost of genome sequencing dropped, the numbers of sequenced bacteria, fungi and archaea expanded dramatically, and bioinformatic methods were developed to rapidly scan whole genomes for the numbers, types, and novelty of secondary metabolite biosynthetic gene clusters. This methodology enabled the identification of microbial taxa gifted for the biosynthesis of drug-like secondary metabolites. As genome sequencing technology progressed, the realities relevant to drug discovery have emerged, the conjectures and misconceptions have been clarified, and opportunities to reinvigorate microbial drug discovery have crystallized. This perspective addresses these critical issues for drug discovery.
Single-molecule long-read sequencing datasets were generated for a son-father-mother trio of Han Chinese descent that is part of the Genome in a Bottle (GIAB) consortium portfolio. The dataset was generated using the Pacific Biosciences Sequel System. The son and each parent were sequenced to an average coverage of 60 and 30, respectively, with N50 subread lengths between 16 and 18?kb. Raw reads and reads aligned to both the GRCh37 and GRCh38 are available at the NCBI GIAB ftp site (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ChineseTrio/). The GRCh38 aligned read data are archived in NCBI SRA (SRX4739017, SRX4739121, and SRX4739122). This dataset is available for anyone to develop and evaluate long-read bioinformatics methods.