Menu
April 21, 2020

Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes.

Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation. © The Author 2017. Published by Oxford University Press.


April 21, 2020

Mutation of a bHLH transcription factor allowed almond domestication.

Wild almond species accumulate the bitter and toxic cyanogenic diglucoside amygdalin. Almond domestication was enabled by the selection of genotypes harboring sweet kernels. We report the completion of the almond reference genome. Map-based cloning using an F1 population segregating for kernel taste led to the identification of a 46-kilobase gene cluster encoding five basic helix-loop-helix transcription factors, bHLH1 to bHLH5. Functional characterization demonstrated that bHLH2 controls transcription of the P450 monooxygenase-encoding genes PdCYP79D16 and PdCYP71AN24, which are involved in the amygdalin biosynthetic pathway. A nonsynonymous point mutation (Leu to Phe) in the dimerization domain of bHLH2 prevents transcription of the two cytochrome P450 genes, resulting in the sweet kernel trait. Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.


April 21, 2020

Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement.

Maize is one of the most important crops globally, and it shows remarkable genetic diversity. Knowledge of this diversity could help in crop improvement; however, gold-standard genomes have been elucidated only for modern temperate varieties. Here, we present a high-quality reference genome (contig N50 of 15.78?megabases) of the maize small-kernel inbred line, which is derived from a tropical landrace. Using haplotype maps derived from B73, Mo17 and SK, we identified 80,614 polymorphic structural variants across 521 diverse lines. Approximately 22% of these variants could not be detected by traditional single-nucleotide-polymorphism-based approaches, and some of them could affect gene expression and trait performance. To illustrate the utility of the diverse SK line, we used it to perform map-based cloning of a major effect quantitative trait locus controlling kernel weight-a key trait selected during maize improvement. The underlying candidate gene ZmBARELY ANY MERISTEM1d provides a target for increasing crop yields.


April 21, 2020

Insights into the evolution and drug susceptibility of Babesia duncani from the sequence of its mitochondrial and apicoplast genomes.

Babesia microti and Babesia duncani are the main causative agents of human babesiosis in the United States. While significant knowledge about B. microti has been gained over the past few years, nothing is known about B. duncani biology, pathogenesis, mode of transmission or sensitivity to currently recommended therapies. Studies in immunocompetent wild type mice and hamsters have shown that unlike B. microti, infection with B. duncani results in severe pathology and ultimately death. The parasite factors involved in B. duncani virulence remain unknown. Here we report the first known completed sequence and annotation of the apicoplast and mitochondrial genomes of B. duncani. We found that the apicoplast genome of this parasite consists of a 34?kb monocistronic circular molecule encoding functions that are important for apicoplast gene transcription as well as translation and maturation of the organelle’s proteins. The mitochondrial genome of B. duncani consists of a 5.9?kb monocistronic linear molecule with two inverted repeats of 48?bp at both ends. Using the conserved cytochrome b (Cytb) and cytochrome c oxidase subunit I (coxI) proteins encoded by the mitochondrial genome, phylogenetic analysis revealed that B. duncani defines a new lineage among apicomplexan parasites distinct from B. microti, Babesia bovis, Theileria spp. and Plasmodium spp. Annotation of the apicoplast and mitochondrial genomes of B. duncani identified targets for development of effective therapies. Our studies set the stage for evaluation of the efficacy of these drugs alone or in combination against B. duncani in culture as well as in animal models.Copyright © 2018 Australian Society for Parasitology. Published by Elsevier Ltd. All rights reserved.


April 21, 2020

Hybrid sequencing reveals insight into heat sensing and signaling of bread wheat.

Wheat (Triticum aestivum L.), a globally important crop, is challenged by increasing temperatures (heat stress, HS). However its polyploid nature, the incompleteness of its genome sequences and annotation, the lack of comprehensive HS-responsive transcriptomes and the unexplored heat sensing and signaling of wheat hinder our full understanding of its adaptations to HS. The recently released genome sequences of wheat, as well as emerging single-molecular sequencing technologies, provide an opportunity to thoroughly investigate the molecular mechanisms of the wheat response to HS. We generated a high-resolution spatio-temporal transcriptome map of wheat flag leaves and filling grain under HS at 0 min, 5 min, 10 min, 30 min, 1 h and 4 h by combining full-length single-molecular sequencing and Illumina short reads sequencing. This hybrid sequencing newly discovered 4947 loci and 70 285 transcripts, generating the comprehensive and dynamic list of HS-responsive full-length transcripts and complementing the recently released wheat reference genome. Large-scale analysis revealed a global landscape of heat adaptations, uncovering unexpected rapid heat sensing and signaling, significant changes of more than half of HS-responsive genes within 30 min, heat shock factor-dependent and -independent heat signaling, and metabolic alterations in early HS-responses. Integrated analysis also demonstrated the differential responses and partitioned functions between organs and subgenomes, and suggested a differential pattern of transcriptional and alternative splicing regulation in the HS response. This study provided comprehensive data for dissecting molecular mechanisms of early HS responses in wheat and highlighted the genomic plasticity and evolutionary divergence of polyploidy wheat. © 2019 The Authors. The Plant Journal published by John Wiley & Sons Ltd and Society for Experimental Biology.


April 21, 2020

Recompleting the Caenorhabditis elegans genome.

Caenorhabditis elegans was the first multicellular eukaryotic genome sequenced to apparent completion. Although this assembly employed a standard C. elegans strain (N2), it used sequence data from several laboratories, with DNA propagated in bacteria and yeast. Thus, the N2 assembly has many differences from any C. elegans available today. To provide a more accurate C. elegans genome, we performed long-read assembly of VC2010, a modern strain derived from N2. Our VC2010 assembly has 99.98% identity to N2 but with an additional 1.8 Mb including tandem repeat expansions and genome duplications. For 116 structural discrepancies between N2 and VC2010, 97 structures matching VC2010 (84%) were also found in two outgroup strains, implying deficiencies in N2. Over 98% of N2 genes encoded unchanged products in VC2010; moreover, we predicted =53 new genes in VC2010. The recompleted genome of C. elegans should be a valuable resource for genetics, genomics, and systems biology. © 2019 Yoshimura et al.; Published by Cold Spring Harbor Laboratory Press.


April 21, 2020

Plant ISOform sequencing database (PISO): a comprehensive repertory of full-length transcripts in plants.

In higher eukaryotes, alternative splicing (AS) and alternative polyadenylation (APA) events can produce multiple transcript isoforms in the majority of genes, which significantly increase the protein- coding potential of a genome (Pan et al., 2008; Anvar et al., 2018). Different transcript isoforms might encode proteins with different functions or affect the mRNA stability and translational capacity, in some sense AS and APA events can dramatically increase the complexity and flexibility of the entire transcriptome and proteome (Yang et al., 2016; Feng et al., 2015; Li et al., 2017a; Wang et al., 2017a). Many databases contained AS events and transcripts in animals are available in some public resources such as ASTD and MAASE (Zheng et al., 2005), whereas there is no database containing full-length transcripts and AS events in plants up to now. Next-generation sequencing (NGS) technology has limitation for identifying AS and APA events due to short reads and low accuracy. In recent years, isoform sequencing (Iso-Seq) using Pacbio single molecule real-time sequencing (SMRT) platform can generate full-length sequences and provide accurate information about AS and transcriptional start sites (Li et al., 2017a). In this study, we collected the plant Iso-Seq data sequenced by Pacbio platform from NCBI database up to the end of 2017, and employed unified pipelines to process all the full-length transcripts in different species. Based on these data, we constructed Plant ISOform sequencing database (PISO, http://cbi.hzau.edu.cn/piso/).


April 21, 2020

Analysis of the Complete Genome Sequence of a Novel, Pseudorabies Virus Strain Isolated in Southeast Europe.

Pseudorabies virus (PRV) is the causative agent of Aujeszky’s disease giving rise to significant economic losses worldwide. Many countries have implemented national programs for the eradication of this virus. In this study, long-read sequencing was used to determine the nucleotide sequence of the genome of a novel PRV strain (PRV-MdBio) isolated in Serbia.In this study, a novel PRV strain was isolated and characterized. PRV-MdBio was found to exhibit similar growth properties to those of another wild-type PRV, the strain Kaplan. Single-molecule real-time (SMRT) sequencing has revealed that the new strain differs significantly in base composition even from strain Kaplan, to which it otherwise exhibits the highest similarity. We compared the genetic composition of PRV-MdBio to strain Kaplan and the China reference strain Ea and obtained that radical base replacements were the most common point mutations preceding conservative and silent mutations. We also found that the adaptation of PRV to cell culture does not lead to any tendentious genetic alteration in the viral genome.PRV-MdBio is a wild-type virus, which differs in base composition from other PRV strains to a relatively large extent.


April 21, 2020

Identification of putative genes for polyphenol biosynthesis in olive fruits and leaves using full-length transcriptome sequencing.

Olive (Olea europaea) is a rich source of valuable bioactive polyphenols, which has attracted widespread interest. In this study, we combined targeted metabolome, Pacbio ISOseq transcriptome, and Illumina RNA-seq transcriptome to investigate the association between polyphenols and gene expression in the developing olive fruits and leaves. A total of 12 main polyphenols were measured, and 122 transcripts of 17 gene families, 101 transcripts of 9 gene families, and 106 transcripts of 6 gene families that encode for enzymes involved in flavonoid, oleuropein, and hydroxytyrosol biosynthesis were separately identified. Additionally, 232 alternative splicing events of 18 genes related to polyphenol synthesis were analyzed. This is the first time that the third generations of full-length transcriptome technology were used to study the gene expression pattern of olive fruits and leaves. The results of transcriptome combined with targeted metabolome can help us better understand the polyphenol biosynthesis pathways in the olive.Copyright © 2019 Elsevier Ltd. All rights reserved.


April 21, 2020

A global survey of full-length transcriptome of Ginkgo biloba reveals transcript variants involved in flavonoid biosynthesis

Ginkgo biloba, which contains flavonoids as bioactive components, is widely used in traditional Chinese medicine. Increasing the flavonoid production of medicinal plants through genetic engineering generally focuses on the key genes involved in flavonoid biosynthesis. However, the molecular mechanisms underlying such biosynthesis are not yet well understood. To understand these mechanisms, a combination of second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing was applied to G. biloba. Eight tissues were sampled for SMRT sequencing to generate a high-quality, full-length transcriptome database. From 23.36 Gb clean reads, 12,954 alternative polyadenylation events, 12,290 alternative splicing events, 929 fusion transcripts, 2,286 novel transcripts, and 1,270 lncRNAs were predicted by removing redundant reads. Further studies reveal that 7 AS, 5 lncRNA, and 6 fusion gene events were identified in flavonoid biosynthesis. A total of 12 gene modules were revealed to be involved in flavonoid metabolism structural genes and transcription factors by constructing co-expression networks. Weighted gene coexpression network analysis (WGCNA) analysis reveals that some hub genes operate during the biosynthesis by identifying transcription factors (TFs) and structure genes. Seven key hub genes were also identified by analyzing the correlation between gene expression level and flavonoids content. The results highlight the importance of SMRT sequencing of the full-length transcriptome in improving genome annotation and elucidating the gene regulation of flavonoid biosynthesis in G. biloba by providing a comprehensive set of reference transcripts.


April 21, 2020

Transcriptome analysis reveals multiple signal network contributing to the Verticillium wilt resistance in eggplant

Verticillium wilt is a devastating disease in eggplants. In order to understand the molecular mechanism of disease resistance in eggplants, transcriptomes of Verticillium wilt infected eggplants were detected. A total of 480, 518, 887 and 1 046 Verticillium wilt related differentially expressed genes were identified at 6 (V6), 12 (V12), 24 (V24) and 48?h (V48), respectively. COG function classification revealed that most of DEGs functioned in “Amino acid transport and metabolism”, “Cytoskeleton” and “Cell motility”. In addition, compared the control plants (V0) to infected eggplants (V6-V48), a total of 111 common DEGs were identified. Except for “General function prediction only”, most of the DEGs enriched in “Signal transduction”. DEGs associated to different hormone signals, including GID1B, ROPGAP1, OPT3 and CDPK, were identified throughout the whole infection process. Cross-talk among defense signal pathways plays major roles in the Verticillium wilt disease resistance in eggplants.


April 21, 2020

The complete genome sequence of Ethanoligenens harbinense reveals the metabolic pathway of acetate-ethanol fermentation: A novel understanding of the principles of anaerobic biotechnology.

Ethanol-type fermentation is one of three main fermentation types in the acidogenesis of anaerobic treatment systems. Non-spore-forming Ethanoligenens is as a typical genus capable of ethanol-type fermentation in mixed culture (i.e. acetate-ethanol fermentation). This genus can produce ethanol, acetate, CO2, and H2 using carbohydrates, and has application potential in anaerobic bioprocesses. Here, the complete genome sequences and methylome of Ethanoligenens harbinense strains with different autoaggregative and coaggregative abilities were obtained using the PacBio single-molecule real-time sequencing platform. The genome size of E. harbinense strains was about 2.97-3.10?Mb with 55.5% G+C content. 3020-3153 genes were annotated, most of which were methylated at specific sites or motifs. The methylation types included 6mA, 4mC, and unknown types. Comparative genomic analysis demonstrated low levels of genetic similarity between E. harbinense and other well-known hydrogen-producing bacteria (i.e., Clostridium and Thermoanaerobacter) in phylogenesis. Hydrogen production of E. harbinense was catalyzed by genes that encode [FeFe]-hydrogenases and that were synthesized by three maturases of [FeFe]-H2ase. The metabolic mechanism of H2-ethanol co-production fermentation, catalyzed by pyruvate ferredoxin oxidoreductase was proposed. This study provides genetic and evolutionary information of a model genus for the further investigation of the metabolic pathway and regulatory network of ethanol-type fermentation and anaerobic bioprocesses for waste or wastewater treatment.Copyright © 2019. Published by Elsevier Ltd.


April 21, 2020

5’UTR-mediated regulation of Ataxin-1 expression.

Expression of mutant Ataxin-1 with an abnormally expanded polyglutamine domain is necessary for the onset and progression of spinocerebellar ataxia type 1 (SCA1). Understanding how Ataxin-1 expression is regulated in the human brain could inspire novel molecular therapies for this fatal, dominantly inherited neurodegenerative disease. Previous studies have shown that the ATXN1 3’UTR plays a key role in regulating the Ataxin-1 cellular pool via diverse post-transcriptional mechanisms. Here we show that elements within the ATXN1 5’UTR also participate in the regulation of Ataxin-1 expression. PCR and PacBio sequencing analysis of cDNA obtained from control and SCA1 human brain samples revealed the presence of three major, alternatively spliced ATXN1 5’UTR variants. In cell-based assays, fusion of these variants upstream of an EGFP reporter construct revealed significant and differential impacts on total EGFP protein output, uncovering a type of genetic rheostat-like function of the ATXN1 5’UTR. We identified ribosomal scanning of upstream AUG codons and increased transcript instability as potential mechanisms of regulation. Importantly, transcript-based analyses revealed significant differences in the expression pattern of ATXN1 5’UTR variants between control and SCA1 cerebellum. Together, the data presented here shed light into a previously unknown role for the ATXN1 5’UTR in the regulation of Ataxin-1 and provide new opportunities for the development of SCA1 therapeutics. Copyright © 2019. Published by Elsevier Inc.


April 21, 2020

Development of CRISPR-Cas systems for genome editing and beyond

The development of clustered regularly interspaced short-palindromic repeat (CRISPR)-Cas systems for genome editing has transformed the way life science research is conducted and holds enormous potential for the treatment of disease as well as for many aspects of biotech- nology. Here, I provide a personal perspective on the development of CRISPR-Cas9 for genome editing within the broader context of the field and discuss our work to discover novel Cas effectors and develop them into additional molecular tools. The initial demonstra- tion of Cas9-mediated genome editing launched the development of many other technologies, enabled new lines of biological inquiry, and motivated a deeper examination of natural CRISPR-Cas systems, including the discovery of new types of CRISPR-Cas systems. These new discoveries in turn spurred further technological developments. I review these exciting discoveries and technologies as well as provide an overview of the broad array of applications of these technologies in basic research and in the improvement of human health. It is clear that we are only just beginning to unravel the potential within microbial diversity, and it is quite likely that we will continue to discover other exciting phenomena, some of which it may be possible to repurpose as molecular technologies. The transformation of mysterious natural phenomena to powerful tools, however, takes a collective effort to discover, characterize, and engineer them, and it has been a privilege to join the numerous researchers who have contributed to this transformation of CRISPR-Cas systems.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.