Menu
September 22, 2019

Long-read sequencing data analysis for yeasts.

Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ~41 h to generate a complete and well-annotated genome from ~100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.


September 22, 2019

The genome of Artemisia annua provides insight into the evolution of Asteraceae family and artemisinin biosynthesis.

Artemisia annua, commonly known as sweet wormwood or Qinghao, is a shrub native to China and has long been used for medicinal purposes. A. annua is now cultivated globally as the only natural source of a potent anti-malarial compound, artemisinin. Here, we report a high-quality draft assembly of the 1.74-gigabase genome of A. annua, which is highly heterozygous, rich in repetitive sequences, and contains 63 226 protein-coding genes, one of the largest numbers among the sequenced plant species. We found that, as one of a few sequenced genomes in the Asteraceae, the A. annua genome contains a large number of genes specific to this large angiosperm clade. Notably, the expansion and functional diversification of genes encoding enzymes involved in terpene biosynthesis are consistent with the evolution of the artemisinin biosynthetic pathway. We further revealed by transcriptome profiling that A. annua has evolved the sophisticated transcriptional regulatory networks underlying artemisinin biosynthesis. Based on comprehensive genomic and transcriptomic analyses we generated transgenic A. annua lines producing high levels of artemisinin, which are now ready for large-scale production and thereby will help meet the challenge of increasing global demand of artemisinin. Copyright © 2018 The Author. Published by Elsevier Inc. All rights reserved.


September 22, 2019

Mycobacterial biomaterials and resources for researchers.

There are many resources available to mycobacterial researchers, including culture collections around the world that distribute biomaterials to the general scientific community, genomic and clinical databases, and powerful bioinformatics tools. However, many of these resources may be unknown to the research community. This review article aims to summarize and publicize many of these resources, thus strengthening the quality and reproducibility of mycobacterial research by providing the scientific community access to authenticated and quality-controlled biomaterials and a wealth of information, analytical tools and research opportunities.


September 22, 2019

Multiplex assessment of protein variant abundance by massively parallel sequencing.

Determining the pathogenicity of genetic variants is a critical challenge, and functional assessment is often the only option. Experimentally characterizing millions of possible missense variants in thousands of clinically important genes requires generalizable, scalable assays. We describe variant abundance by massively parallel sequencing (VAMP-seq), which measures the effects of thousands of missense variants of a protein on intracellular abundance simultaneously. We apply VAMP-seq to quantify the abundance of 7,801 single-amino-acid variants of PTEN and TPMT, proteins in which functional variants are clinically actionable. We identify 1,138 PTEN and 777 TPMT variants that result in low protein abundance, and may be pathogenic or alter drug metabolism, respectively. We observe selection for low-abundance PTEN variants in cancer, and show that p.Pro38Ser, which accounts for ~10% of PTEN missense variants in melanoma, functions via a dominant-negative mechanism. Finally, we demonstrate that VAMP-seq is applicable to other genes, highlighting its generalizability.


September 22, 2019

Sea cucumber genome provides insights into saponin biosynthesis and aestivation regulation.

Echinoderms exhibit several fascinating evolutionary innovations that are rarely seen in the animal kingdom, but how these animals attained such features is not well understood. Here we report the sequencing and analysis of the genome and extensive transcriptomes of the sea cucumber Apostichopus japonicus, a species from a special echinoderm group with extraordinary potential for saponin synthesis, aestivation and organ regeneration. The sea cucumber does not possess a reorganized Hox cluster as previously assumed for all echinoderms, and the spatial expression of Hox7 and Hox11/13b potentially guides the embryo-to-larva axial transformation. Contrary to the typical production of lanosterol in animal cholesterol synthesis, the oxidosqualene cyclase of sea cucumber produces parkeol for saponin synthesis and has “plant-like” motifs suggestive of convergent evolution. The transcriptional factors Klf2 and Egr1 are identified as key regulators of aestivation, probably exerting their effects through a clock gene-controlled process. Intestinal hypometabolism during aestivation is driven by the DNA hypermethylation of various metabolic gene pathways, whereas the transcriptional network of intestine regeneration involves diverse signaling pathways, including Wnt, Hippo and FGF. Decoding the sea cucumber genome provides a new avenue for an in-depth understanding of the extraordinary features of sea cucumbers and other echinoderms.


September 22, 2019

Draft genome sequence of Annulohypoxylon stygium, Aspergillus mulundensis, Berkeleyomyces basicola (syn. Thielaviopsis basicola), Ceratocystis smalleyi, two Cercospora beticola strains, Coleophoma cylindrospora, Fusarium fracticaudum, Phialophora cf. hyalina, and Morchella septimelata.

Draft genomes of the species Annulohypoxylon stygium, Aspergillus mulundensis, Berkeleyomyces basicola (syn. Thielaviopsis basicola), Ceratocystis smalleyi, two Cercospora beticola strains, Coleophoma cylindrospora, Fusarium fracticaudum, Phialophora cf. hyalina and Morchella septimelata are presented. Both mating types (MAT1-1 and MAT1-2) of Cercospora beticola are included. Two strains of Coleophoma cylindrospora that produce sulfated homotyrosine echinocandin variants, FR209602, FR220897 and FR220899 are presented. The sequencing of Aspergillus mulundensis, Coleophoma cylindrospora and Phialophora cf. hyalina has enabled mapping of the gene clusters encoding the chemical diversity from the echinocandin pathways, providing data that reveals the complexity of secondary metabolism in these different species. Overall these genomes provide a valuable resource for understanding the molecular processes underlying pathogenicity (in some cases), biology and toxin production of these economically important fungi.


September 22, 2019

High-Resolution Full-Length HLA Typing Method Using Third Generation (Pac-Bio SMRT) Sequencing Technology.

The human HLA genes are among the most polymorphic genes in the human genome. Therefore, it is very difficult to find two unrelated individuals with identical HLA molecules. As a result, HLA Class I and Class II genes are routinely sequenced or serotyped for organ transplantation, autoimmune disease-association studies, drug hypersensitivity research, and other applications. However, these methods were able to give two or four digit data, which was not sufficient enough to understand the completeness of haplotypes of HLA genes. To overcome these limitations, we here described end-to-end workflow for sequencing of HLA class I and class II genes using third generation sequencing, SMRT technology. This method produces fully-phased, unambiguous, allele-level information on the PacBio System.


September 22, 2019

Long-read whole genome sequencing and comparative analysis of six strains of the human pathogen Orientia tsutsugamushi.

Orientia tsutsugamushi is a clinically important but neglected obligate intracellular bacterial pathogen of the Rickettsiaceae family that causes the potentially life-threatening human disease scrub typhus. In contrast to the genome reduction seen in many obligate intracellular bacteria, early genetic studies of Orientia have revealed one of the most repetitive bacterial genomes sequenced to date. The dramatic expansion of mobile elements has hampered efforts to generate complete genome sequences using short read sequencing methodologies, and consequently there have been few studies of the comparative genomics of this neglected species.We report new high-quality genomes of O. tsutsugamushi, generated using PacBio single molecule long read sequencing, for six strains: Karp, Kato, Gilliam, TA686, UT76 and UT176. In comparative genomics analyses of these strains together with existing reference genomes from Ikeda and Boryong strains, we identify a relatively small core genome of 657 genes, grouped into core gene islands and separated by repeat regions, and use the core genes to infer the first whole-genome phylogeny of Orientia.Complete assemblies of multiple Orientia genomes verify initial suggestions that these are remarkable organisms. They have larger genomes compared with most other Rickettsiaceae, with widespread amplification of repeat elements and massive chromosomal rearrangements between strains. At the gene level, Orientia has a relatively small set of universally conserved genes, similar to other obligate intracellular bacteria, and the relative expansion in genome size can be accounted for by gene duplication and repeat amplification. Our study demonstrates the utility of long read sequencing to investigate complex bacterial genomes and characterise genomic variation.


September 22, 2019

Characterization of a novel multidrug resistance plasmid pSGB23 isolated from Salmonella enterica subspecies enterica serovar Saintpaul.

Salmonella enterica subspecies enterica serovar Saintpaul (S. Saintpaul) is an important gut pathogen which causes salmonellosis worldwide. Although intestinal salmonellosis is usually self-limiting, it can be life-threatening in children, the elderlies and immunocompromised patients. Appropriate antibiotic treatment is therefore required for these patients. However, the efficacy of many antibiotics on S. enterica infections has been greatly compromised due to spreading of multidrug resistance (MDR) plasmids, which poses serious threats on public health and needs to be closely monitored. In this study, we sequenced and fully characterized an S. enterica MDR plasmid pSGB23 isolated from chicken.Complete genome sequence analysis revealed that S. Saintpaul strain SGB23 harbored a 254 kb megaplasmid pSGB23, which carries 11 antibiotic resistance genes responsible for resistance to 9 classes of antibiotics and quaternary ammonium compounds that are commonly used to disinfect food processing facilities. Furthermore, we found that pSGB23 carries multiple conjugative systems, which allow it to spread into other Enterobacteriaceae spp. by self-conjugation. It also harbors multiple types of replicons and plasmid maintenance and addictive systems, which explains its broad host range and stable inheritance.We report here a novel MDR plasmid pSGB23 harboured by S. enterica. To our knowledge, it carried the greatest number of antibiotic resistance genes with the broadest range of resistance spectrum among S. enterica MDR plasmids identified so far. The isolation of pSGB23 from food sources is worrisome, while surveillance on its further spreading will be carried out based on the findings reported in this study.


September 22, 2019

HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads.

Haplotype assembly is the process of assigning the different alleles of the variants covered by mapped sequencing reads to the two haplotypes of the genome of a human individual. Long reads, which are nowadays cheaper to produce and more widely available than ever before, have been used to reduce the fragmentation of the assembled haplotypes since their ability to span several variants along the genome. These long reads are also characterized by a high error rate, an issue which may be mitigated, however, with larger sets of reads, when this error rate is uniform across genome positions. Unfortunately, current state-of-the-art dynamic programming approaches designed for long reads deal only with limited coverages.Here, we propose a new method for assembling haplotypes which combines and extends the features of previous approaches to deal with long reads and higher coverages. In particular, our algorithm is able to dynamically adapt the estimated number of errors at each variant site, while minimizing the total number of error corrections necessary for finding a feasible solution. This allows our method to significantly reduce the required computational resources, allowing to consider datasets composed of higher coverages. The algorithm has been implemented in a freely available tool, HapCHAT: Haplotype Assembly Coverage Handling by Adapting Thresholds. An experimental analysis on sequencing reads with up to 60 × coverage reveals improvements in accuracy and recall achieved by considering a higher coverage with lower runtimes.Our method leverages the long-range information of sequencing reads that allows to obtain assembled haplotypes fragmented in a lower number of unphased haplotype blocks. At the same time, our method is also able to deal with higher coverages to better correct the errors in the original reads and to obtain more accurate haplotypes as a result.HapCHAT is available at http://hapchat.algolab.eu under the GNU Public License (GPL).


September 22, 2019

Raising the stakes: Loss of efflux pump regulation decreases meropenem susceptibility in Burkholderia pseudomallei

Burkholderia pseudomallei, the causative agent of the high-mortality disease melioidosis, is a gram-negative bacterium that is naturally resistant to many antibiotics. There is no vaccine for melioidosis, and effective eradication is reliant on biphasic and prolonged antibiotic administration. The carbapenem drug meropenem is the current gold standard option for treating severe melioidosis. Intrinsic B. pseudomallei resistance toward meropenem has not yet been documented; however, resistance could conceivably develop over the course of infection, leading to prolonged sepsis and treatment failure.We examined our 30-year clinical collection of melioidosis cases to identify B. pseudomallei isolates with reduced meropenem susceptibility. Isolates were subjected to minimum inhibitory concentration (MIC) testing toward meropenem. Paired isolates from patients who had evolved decreased susceptibility were subjected to whole-genome sequencing. Select agent-compliant genetic manipulation was carried out to confirm the molecular mechanisms conferring resistance.We identified 11 melioidosis cases where B. pseudomallei isolates developed decreased susceptibility toward meropenem during treatment, including 2 cases not treated with this antibiotic. Meropenem MICs increased from 0.5-0.75 µg/mL to 3-8 µg/mL. Comparative genomics identified multiple mutations affecting multidrug resistance-nodulation-division (RND) efflux pump regulators, with concomitant overexpression of their corresponding pumps. All cases were refractory to treatment despite aggressive, targeted therapy, and 2 were associated with a fatal outcome.This study confirms the role of RND efflux pumps in decreased meropenem susceptibility in B. pseudomallei. These findings have important ramifications for the diagnosis, treatment, and management of life-threatening melioidosis cases.


September 22, 2019

Tumor-specific mitochondrial DNA variants are rarely detected in cell-free DNA.

The use of blood-circulating cell-free DNA (cfDNA) as a “liquid biopsy” in oncology is being explored for its potential as a cancer biomarker. Mitochondria contain their own circular genomic entity (mitochondrial DNA, mtDNA), up to even thousands of copies per cell. The mutation rate of mtDNA is several orders of magnitude higher than that of the nuclear DNA. Tumor-specific variants have been identified in tumors along the entire mtDNA, and their number varies among and within tumors. The high mtDNA copy number per cell and the high mtDNA mutation rate make it worthwhile to explore the potential of tumor-specific cf-mtDNA variants as cancer marker in the blood of cancer patients. We used single-molecule real-time (SMRT) sequencing to profile the entire mtDNA of 19 tissue specimens (primary tumor and/or metastatic sites, and tumor-adjacent normal tissue) and 9 cfDNA samples, originating from 8 cancer patients (5 breast, 3 colon). For each patient, tumor-specific mtDNA variants were detected and traced in cfDNA by SMRT sequencing and/or digital PCR to explore their feasibility as cancer biomarker. As a reference, we measured other blood-circulating biomarkers for these patients, including driver mutations in nuclear-encoded cfDNA and cancer-antigen levels or circulating tumor cells. Four of the 24 (17%) tumor-specific mtDNA variants were detected in cfDNA, however at much lower allele frequencies compared to mutations in nuclear-encoded driver genes in the same samples. Also, extensive heterogeneity was observed among the heteroplasmic mtDNA variants present in an individual. We conclude that there is limited value in tracing tumor-specific mtDNA variants in blood-circulating cfDNA with the current methods available. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.


September 22, 2019

A molecular window into the biology and epidemiology of Pneumocystis spp.

Pneumocystis, a unique atypical fungus with an elusive lifestyle, has had an important medical history. It came to prominence as an opportunistic pathogen that not only can cause life-threatening pneumonia in patients with HIV infection and other immunodeficiencies but also can colonize the lungs of healthy individuals from a very early age. The genus Pneumocystis includes a group of closely related but heterogeneous organisms that have a worldwide distribution, have been detected in multiple mammalian species, are highly host species specific, inhabit the lungs almost exclusively, and have never convincingly been cultured in vitro, making Pneumocystis a fascinating but difficult-to-study organism. Improved molecular biologic methodologies have opened a new window into the biology and epidemiology of Pneumocystis. Advances include an improved taxonomic classification, identification of an extremely reduced genome and concomitant inability to metabolize and grow independent of the host lungs, insights into its transmission mode, recognition of its widespread colonization in both immunocompetent and immunodeficient hosts, and utilization of strain variation to study drug resistance, epidemiology, and outbreaks of infection among transplant patients. This review summarizes these advances and also identifies some major questions and challenges that need to be addressed to better understand Pneumocystis biology and its relevance to clinical care. Copyright © 2018 American Society for Microbiology.


September 22, 2019

Variation in human chromosome 21 ribosomal RNA genes characterized by TAR cloning and long-read sequencing.

Despite the key role of the human ribosome in protein biosynthesis, little is known about the extent of sequence variation in ribosomal DNA (rDNA) or its pre-rRNA and rRNA products. We recovered ribosomal DNA segments from a single human chromosome 21 using transformation-associated recombination (TAR) cloning in yeast. Accurate long-read sequencing of 13 isolates covering ~0.82 Mb of the chromosome 21 rDNA complement revealed substantial variation among tandem repeat rDNA copies, several palindromic structures and potential errors in the previous reference sequence. These clones revealed 101 variant positions in the 45S transcription unit and 235 in the intergenic spacer sequence. Approximately 60% of the 45S variants were confirmed in independent whole-genome or RNA-seq data, with 47 of these further observed in mature 18S/28S rRNA sequences. TAR cloning and long-read sequencing enabled the accurate reconstruction of multiple rDNA units and a new, high-quality 44 838 bp rDNA reference sequence, which we have annotated with variants detected from chromosome 21 of a single individual. The large number of variants observed reveal heterogeneity in human rDNA, opening up the possibility of corresponding variations in ribosome dynamics.


September 22, 2019

A graph-based approach to diploid genome assembly.

Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and, therefore, fail to capture the diploid nature of the organism under study. Thus, building an assembler capable of producing accurate and complete diploid assemblies, while being resource-efficient with respect to sequencing costs, is a key challenge to be addressed by the bioinformatics community.We present a novel graph-based approach to diploid assembly, which combines accurate Illumina data and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness of our method on a pseudo-diploid yeast genome and show that we require as little as 50× coverage Illumina data and 10× PacBio data to generate accurate and complete assemblies. Additionally, we show that our approach has the ability to detect and phase structural variants.https://github.com/whatshap/whatshap.Supplementary data are available at Bioinformatics online.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.