Menu
September 22, 2019

The genome of an underwater architect, the caddisfly Stenopsyche tienmushanensis Hwang (Insecta: Trichoptera).

Caddisflies (Insecta: Trichoptera) are a highly adapted freshwater group of insects split from a common ancestor with Lepidoptera. They are the most diverse (>16,000 species) of the strictly aquatic insect orders and are widely employed as bio-indicators in water quality assessment and monitoring. Among the numerous adaptations to aquatic habitats, caddisfly larvae use silk and materials from the environment (e.g., stones, sticks, leaf matter) to build composite structures such as fixed retreats and portable cases. Understanding how caddisflies have adapted to aquatic habitats will help explain the evolution and subsequent diversification of the group.We sequenced a retreat-builder caddisfly Stenopsyche tienmushanensis Hwang and assembled a high-quality genome from both Illumina and Pacific Biosciences (PacBio) sequencing. In total, 601.2 M Illumina reads (90.2 Gb) and 16.9 M PacBio subreads (89.0 Gb) were generated. The 451.5 Mb assembled genome has a contig N50 of 1.29 M, has a longest contig of 4.76 Mb, and covers 97.65% of the 1,658 insect single-copy genes as assessed by Benchmarking Universal Single-Copy Orthologs. The genome comprises 36.76% repetitive elements. A total of 14,672 predicted protein-coding genes were identified. The genome revealed gene expansions in specific groups of the cytochrome P450 family and olfactory binding proteins, suggesting potential genomic features associated with pollutant tolerance and mate finding. In addition, the complete gene complex of the highly repetitive H-fibroin, the major protein component of caddisfly larval silk, was assembled.We report the draft genome of Stenopsyche tienmushanensis, the highest-quality caddisfly genome so far. The genome information will be an important resource for the study of caddisflies and may shed light on the evolution of aquatic insects.


September 22, 2019

Construction of Pará rubber tree genome and multi-transcriptome database accelerates rubber researches.

Natural rubber is an economically important material. Currently the Pará rubber tree, Hevea brasiliensis is the main commercial source. Little is known about rubber biosynthesis at the molecular level. Next-generation sequencing (NGS) technologies brought draft genomes of three rubber cultivars and a variety of RNA sequencing (RNA-seq) data. However, no current genome or transcriptome databases (DB) are organized by gene.A gene-oriented database is a valuable support for rubber research. Based on our original draft genome sequence of H. brasiliensis RRIM600, we constructed a rubber tree genome and transcriptome DB. Our DB provides genome information including gene functional annotations and multi-transcriptome data of RNA-seq, full-length cDNAs including PacBio Isoform sequencing (Iso-Seq), ESTs and genome wide transcription start sites (TSSs) derived from CAGE technology. Using our original and publically available RNA-seq data, we calculated co-expressed genes for identifying functionally related gene sets and/or genes regulated by the same transcription factor (TF). Users can access multi-transcriptome data through both a gene-oriented web page and a genome browser. For the gene searching system, we provide keyword search, sequence homology search and gene expression search; users can also select their expression threshold easily.The rubber genome and transcriptome DB provides rubber tree genome sequence and multi-transcriptomics data. This DB is useful for comprehensive understanding of the rubber transcriptome. This will assist both industrial and academic researchers for rubber and economically important close relatives such as R. communis, M. esculenta and J. curcas. The Rubber Transcriptome DB release 2017.03 is accessible at http://matsui-lab.riken.jp/rubber/ .


September 22, 2019

ABC transporter mis-splicing associated with resistance to Bt toxin Cry2Ab in laboratory- and field-selected pink bollworm.

Evolution of pest resistance threatens the benefits of genetically engineered crops that produce Bacillus thuringiensis (Bt) insecticidal proteins. Strategies intended to delay pest resistance are most effective when implemented proactively. Accordingly, researchers have selected for and analyzed resistance to Bt toxins in many laboratory strains of pests before resistance evolves in the field, but the utility of this approach depends on the largely untested assumption that laboratory- and field-selected resistance to Bt toxins are similar. Here we compared the genetic basis of resistance to Bt toxin Cry2Ab, which is widely deployed in transgenic crops, between laboratory- and field-selected populations of the pink bollworm (Pectinophora gossypiella), a global pest of cotton. We discovered that resistance to Cry2Ab is associated with mutations disrupting the same ATP-binding cassette transporter gene (PgABCA2) in a laboratory-selected strain from Arizona, USA, and in field-selected populations from India. The most common mutation, loss of exon 6 caused by alternative splicing, occurred in resistant larvae from both locations. Together with previous data, the results imply that mutations in the same gene confer Bt resistance in laboratory- and field-selected strains and suggest that focusing on ABCA2 genes may help to accelerate progress in monitoring and managing resistance to Cry2Ab.


September 22, 2019

Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts.

Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34 719 high-quality annotations. Further BLASTn analysis against NCBI non-redundant nucleotide sequences, Coffea canephora coding sequences with UTR, C. arabica ESTs, and Rfam resulted in 1213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5?UTRs, facilitating the identification of upstream open reading frames. The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10 kilo base) were poorly annotated. LRS technology shows the limitation of previous studies. It provides an important tool to produce a reference transcriptome including more of the diversity of full-length transcripts to help understand the biology and support the genetic improvement of polyploid species such as coffee.© The Authors 2017. Published by Oxford University Press.


September 22, 2019

An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations.

Advances in genome sequencing and assembly technologies are generating many high-quality genome sequences, but assemblies of large, repeat-rich polyploid genomes, such as that of bread wheat, remain fragmented and incomplete. We have generated a new wheat whole-genome shotgun sequence assembly using a combination of optimized data types and an assembly algorithm designed to deal with large and complex genomes. The new assembly represents >78% of the genome with a scaffold N50 of 88.8 kb that has a high fidelity to the input data. Our new annotation combines strand-specific Illumina RNA-seq and Pacific Biosciences (PacBio) full-length cDNAs to identify 104,091 high-confidence protein-coding genes and 10,156 noncoding RNA genes. We confirmed three known and identified one novel genome rearrangements. Our approach enables the rapid and scalable assembly of wheat genomes, the identification of structural variants, and the definition of complete gene models, all powerful resources for trait analysis and breeding of this key global crop. © 2017 Clavijo et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

Recurrent structural variation, clustered sites of selection, and disease risk for the complement factor H (CFH) gene family.

Structural variation and single-nucleotide variation of the complement factor H (CFH) gene family underlie several complex genetic diseases, including age-related macular degeneration (AMD) and atypical hemolytic uremic syndrome (AHUS). To understand its diversity and evolution, we performed high-quality sequencing of this ~360-kbp locus in six primate lineages, including multiple human haplotypes. Comparative sequence analyses reveal two distinct periods of gene duplication leading to the emergence of four CFH-related (CFHR) gene paralogs (CFHR2 and CFHR4 ~25-35 Mya and CFHR1 and CFHR3 ~7-13 Mya). Remarkably, all evolutionary breakpoints share a common ~4.8-kbp segment corresponding to an ancestral CFHR gene promoter that has expanded independently throughout primate evolution. This segment is recurrently reused and juxtaposed with a donor duplication containing exons 8 and 9 from ancestral CFH, creating four CFHR fusion genes that include lineage-specific members of the gene family. Combined analysis of >5,000 AMD cases and controls identifies a significant burden of a rare missense mutation that clusters at the N terminus of CFH [P = 5.81 × 10-8, odds ratio (OR) = 9.8 (3.67-Infinity)]. A bipolar clustering pattern of rare nonsynonymous mutations in patients with AMD (P < 10-3) and AHUS (P = 0.0079) maps to functional domains that show evidence of positive selection during primate evolution. Our structural variation analysis in >2,400 individuals reveals five recurrent rearrangement breakpoints that show variable frequency among AMD cases and controls. These data suggest a dynamic and recurrent pattern of mutation critical to the emergence of new CFHR genes but also in the predisposition to complex human genetic disease phenotypes.


September 22, 2019

Proteomic detection of immunoglobulin light chain variable region peptides from amyloidosis patient biopsies.

Immunoglobulin light chain (LC) amyloidosis (AL) is caused by deposition of clonal LCs produced by an underlying plasma cell neoplasm. The clonotypic LC sequences are unique to each patient, and they cannot be reliably detected by either immunoassays or standard proteomic workflows that target the constant regions of LCs. We addressed this issue by developing a novel sequence template-based workflow to detect LC variable (LCV) region peptides directly from AL amyloid deposits. The workflow was implemented in a CAP/CLIA compliant clinical laboratory dedicated to proteomic subtyping of amyloid deposits extracted from either formalin-fixed paraffin-embedded tissues or subcutaneous fat aspirates. We evaluated the performance of the workflow on a validation cohort of 30 AL patients, whose amyloidogenic clone was identified using a novel proteogenomics method, and 30 controls. The recall and negative predictive values of the workflow, when identifying the gene family of the AL clone, were 93 and 98%, respectively. Application of the workflow on a clinical cohort of 500 AL amyloidosis samples highlighted a bias in the LCV gene families used by the AL clones. We also detected similarity between AL clones deposited in multiple organs of systemic AL patients. In summary, AL proteomic data sets are rich in LCV region peptides of potential clinical significance that are recoverable with advanced bioinformatics.


September 22, 2019

wtf genes are prolific dual poison-antidote meiotic drivers.

Meiotic drivers are selfish genes that bias their transmission into gametes, defying Mendelian inheritance. Despite the significant impact of these genomic parasites on evolution and infertility, few meiotic drive loci have been identified or mechanistically characterized. Here, we demonstrate a complex landscape of meiotic drive genes on chromosome 3 of the fission yeasts Schizosaccharomyces kambucha and S. pombe. We identify S. kambucha wtf4 as one of these genes that acts to kill gametes (known as spores in yeast) that do not inherit the gene from heterozygotes. wtf4 utilizes dual, overlapping transcripts to encode both a gamete-killing poison and an antidote to the poison. To enact drive, all gametes are poisoned, whereas only those that inherit wtf4 are rescued by the antidote. Our work suggests that the wtf multigene family proliferated due to meiotic drive and highlights the power of selfish genes to shape genomes, even while imposing tremendous costs to fertility.


September 22, 2019

Full-length transcriptome sequences and splice variants obtained by a combination of sequencing platforms applied to different root tissues of Salvia miltiorrhiza and tanshinone biosynthesis.

Danshen, Salvia miltiorrhiza Bunge, is one of the most widely used herbs in traditional Chinese medicine, wherein its rhizome/roots are particularly valued. The corresponding bioactive components include the tanshinone diterpenoids, the biosynthesis of which is a subject of considerable interest. Previous investigations of the S. miltiorrhiza transcriptome have relied on short-read next-generation sequencing (NGS) technology, and the vast majority of the resulting isotigs do not represent full-length cDNA sequences. Moreover, these efforts have been targeted at either whole plants or hairy root cultures. Here, we demonstrate that the tanshinone pigments are produced and accumulate in the root periderm, and apply a combination of NGS and single-molecule real-time (SMRT) sequencing to various root tissues, particularly including the periderm, to provide a more complete view of the S. miltiorrhiza transcriptome, with further insight into tanshinone biosynthesis as well. In addition, the use of SMRT long-read sequencing offered the ability to examine alternative splicing, which was found to occur in approximately 40% of the detected gene loci, including several involved in isoprenoid/terpenoid metabolism.© 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.


September 22, 2019

Chromosome-level reference genome and alternative splicing atlas of moso bamboo (Phyllostachys edulis).

Bamboo is one of the most important nontimber forestry products worldwide. However, a chromosome-level reference genome is lacking, and an evolutionary view of alternative splicing (AS) in bamboo remains unclear despite emerging omics data and improved technologies.Here, we provide a chromosome-level de novo genome assembly of moso bamboo (Phyllostachys edulis) using additional abundance sequencing data and a Hi-C scaffolding strategy. The significantly improved genome is a scaffold N50 of 79.90 Mb, approximately 243 times longer than the previous version. A total of 51,074 high-quality protein-coding loci with intact structures were identified using single-molecule real-time sequencing and manual verification. Moreover, we provide a comprehensive AS profile based on the identification of 266,711 unique AS events in 25,225 AS genes by large-scale transcriptomic sequencing of 26 representative bamboo tissues using both the Illumina and Pacific Biosciences sequencing platforms. Through comparisons with orthologous genes in related plant species, we observed that the AS genes are concentrated among more conserved genes that tend to accumulate higher transcript levels and share less tissue specificity. Furthermore, gene family expansion, abundant AS, and positive selection were identified in crucial genes involved in the lignin biosynthetic pathway of moso bamboo.These fundamental studies provide useful information for future in-depth analyses of comparative genome and AS features. Additionally, our results highlight a global perspective of AS during evolution and diversification in bamboo.


September 22, 2019

Distinguishing highly similar gene isoforms with a clustering-based bioinformatics analysis of PacBio single-molecule long reads.

Gene isoforms are commonly found in both prokaryotes and eukaryotes. Since each isoform may perform a specific function in response to changing environmental conditions, studying the dynamics of gene isoforms is important in understanding biological processes and disease conditions. However, genome-wide identification of gene isoforms is technically challenging due to the high degree of sequence identity among isoforms. Traditional targeted sequencing approach, involving Sanger sequencing of plasmid-cloned PCR products, has low throughput and is very tedious and time-consuming. Next-generation sequencing technologies such as Illumina and 454 achieve high throughput but their short read lengths are a critical barrier to accurate assembly of highly similar gene isoforms, and may result in ambiguities and false joining during sequence assembly. More recently, the third generation sequencer represented by the PacBio platform offers sufficient throughput and long reads covering the full length of typical genes, thus providing a potential to reliably profile gene isoforms. However, the PacBio long reads are error-prone and cannot be effectively analyzed by traditional assembly programs.We present a clustering-based analysis pipeline integrated with PacBio sequencing data for profiling highly similar gene isoforms. This approach was first evaluated in comparison to de novo assembly of 454 reads using a benchmark admixture containing 10 known, cloned msg genes encoding the major surface glycoprotein of Pneumocystis jirovecii. All 10 msg isoforms were successfully reconstructed with the expected length (~1.5 kb) and correct sequence by the new approach, while 454 reads could not be correctly assembled using various assembly programs. When using an additional benchmark admixture containing 22 known P. jirovecii msg isoforms, this approach accurately reconstructed all but 4 these isoforms in their full-length (~3 kb); these 4 isoforms were present in low concentrations in the admixture. Finally, when applied to the original clinical sample from which the 22 known msg isoforms were cloned, this approach successfully identified not only all known isoforms accurately (~3 kb each) but also 48 novel isoforms.PacBio sequencing integrated with the clustering-based analysis pipeline achieves high-throughput and high-resolution discrimination of highly similar sequences, and can serve as a new approach for genome-wide characterization of gene isoforms and other highly repetitive sequences.


September 22, 2019

Genome-wide identification and analysis of the ALTERNATIVE OXIDASE gene family in diploid and hexaploid wheat.

A comprehensive understanding of wheat responses to environmental stress will contribute to the long-term goal of feeding the planet. ALERNATIVE OXIDASE (AOX) genes encode proteins involved in a bypass of the electron transport chain and are also known to be involved in stress tolerance in multiple species. Here, we report the identification and characterization of the AOX gene family in diploid and hexaploid wheat. Four genes each were found in the diploid ancestors Triticum urartu, and Aegilops tauschii, and three in Aegilops speltoides. In hexaploid wheat (Triticum aestivum), 20 genes were identified, some with multiple splice variants, corresponding to a total of 24 proteins for those with observed transcription and translation. These proteins were classified as AOX1a, AOX1c, AOX1e or AOX1d via phylogenetic analysis. Proteins lacking most or all signature AOX motifs were assigned to putative regulatory roles. Analysis of protein-targeting sequences suggests mixed localization to the mitochondria and other organelles. In comparison to the most studied AOX from Trypanosoma brucei, there were amino acid substitutions at critical functional domains indicating possible role divergence in wheat or grasses in general. In hexaploid wheat, AOX genes were expressed at specific developmental stages as well as in response to both biotic and abiotic stresses such as fungal pathogens, heat and drought. These AOX expression patterns suggest a highly regulated and diverse transcription and expression system. The insights gained provide a framework for the continued and expanded study of AOX genes in wheat for stress tolerance through breeding new varieties, as well as resistance to AOX-targeted herbicides, all of which can ultimately be used synergistically to improve crop yield.


September 22, 2019

PacBio sequencing of gene families – a case study with wheat gluten genes.

Amino acids in wheat (Triticum aestivum) seeds mainly accumulate in storage proteins called gliadins and glutenins. Gliadins contain a/ß-, ?- and ?-types whereas glutenins contain HMW- and LMW-types. Known gliadin and glutenin sequences were largely determined through cloning and sequencing by capillary electrophoresis. This time-consuming process prevents us to intensively study the variation of each orthologous gene copy among cultivars. The throughput and sequencing length of Pacific Bioscience RS (PacBio) single molecule sequencing platform make it feasible to construct contiguous and non-chimeric RNA sequences. We assembled 424 wheat storage protein transcripts from ten wheat cultivars by using just one single-molecule-real-time cell. The protein genes from wheat cultivar Chinese Spring are comparable to known sequences from NCBI. We demonstrated real-time sequencing of gene families with high-throughput and low-cost. This method can be applied to studies of gene amplification and copy number variation among species and cultivars. © 2013 Elsevier B.V. All rights reserved.


September 22, 2019

Isoform sequencing provides a more comprehensive view of the Panax ginseng transcriptome.

Korean ginseng (Panax ginseng C.A. Meyer) has been widely used for medicinal purposes and contains potent plant secondary metabolites, including ginsenosides. To obtain transcriptomic data that offers a more comprehensive view of functional genomics in P. ginseng, we generated genome-wide transcriptome data from four different P. ginseng tissues using PacBio isoform sequencing (Iso-Seq) technology. A total of 135,317 assembled transcripts were generated with an average length of 3.2 kb and high assembly completeness. Of those unigenes, 67.5% were predicted to be complete full-length (FL) open reading frames (ORFs) and exhibited a high gene annotation rate. Furthermore, we successfully identified unique full-length genes involved in triterpenoid saponin synthesis and plant hormonal signaling pathways, including auxin and cytokinin. Studies on the functional genomics of P. ginseng seedlings have confirmed the rapid upregulation of negative feed-back loops by auxin and cytokinin signaling cues. The conserved evolutionary mechanisms in the auxin and cytokinin canonical signaling pathways of P. ginseng are more complex than those in Arabidopsis thaliana. Our analysis also revealed a more detailed view of transcriptome-wide alternative isoforms for 88 genes. Finally, transposable elements (TEs) were also identified, suggesting transcriptional activity of TEs in P. ginseng. In conclusion, our results suggest that long-read, full-length or partial-unigene data with high-quality assemblies are invaluable resources as transcriptomic references in P. ginseng and can be used for comparative analyses in closely related medicinal plants.


September 22, 2019

Exploiting single-molecule transcript sequencing for eukaryotic gene prediction.

We develop a method to predict and validate gene models using PacBio single-molecule, real-time (SMRT) cDNA reads. Ninety-eight percent of full-insert SMRT reads span complete open reading frames. Gene model validation using SMRT reads is developed as automated process. Optimized training and prediction settings and mRNA-seq noise reduction of assisting Illumina reads results in increased gene prediction sensitivity and precision. Additionally, we present an improved gene set for sugar beet (Beta vulgaris) and the first genome-wide gene set for spinach (Spinacia oleracea). The workflow and guidelines are a valuable resource to obtain comprehensive gene sets for newly sequenced genomes of non-model eukaryotes.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.