Menu
September 22, 2019

Defining a personal, allele-specific, and single-molecule long-read transcriptome.

Personal transcriptomes in which all of an individual’s genetic variants (e.g., single nucleotide variants) and transcript isoforms (transcription start sites, splice sites, and polyA sites) are defined and quantified for full-length transcripts are expected to be important for understanding individual biology and disease, but have not been described previously. To obtain such transcriptomes, we sequenced the lymphoblastoid transcriptomes of three family members (GM12878 and the parents GM12891 and GM12892) by using a Pacific Biosciences long-read approach complemented with Illumina 101-bp sequencing and made the following observations. First, we found that reads representing all splice sites of a transcript are evident for most sufficiently expressed genes =3 kb and often for genes longer than that. Second, we added and quantified previously unidentified splicing isoforms to an existing annotation, thus creating the first personalized annotation to our knowledge. Third, we determined SNVs in a de novo manner and connected them to RNA haplotypes, including HLA haplotypes, thereby assigning single full-length RNA molecules to their transcribed allele, and demonstrated Mendelian inheritance of RNA molecules. Fourth, we show how RNA molecules can be linked to personal variants on a one-by-one basis, which allows us to assess differential allelic expression (DAE) and differential allelic isoforms (DAI) from the phased full-length isoform reads. The DAI method is largely independent of the distance between exon and SNV–in contrast to fragmentation-based methods. Overall, in addition to improving eukaryotic transcriptome annotation, these results describe, to our knowledge, the first large-scale and full-length personal transcriptome.


September 22, 2019

Profiling of oral microbiota in early childhood caries using Single-Molecule Real-Time Sequencing

Background: Alterations of oral microbiota are the main cause of the progression of caries. The goal of this study was to characterize the oral microbiota in childhood caries based on single-molecule real-time sequencing. Methods: A total of 21 preschoolers, aged 3-5 years old with severe early childhood caries, and 20 age-matched, caries-free children as controls were recruited. Saliva samples were collected, followed by DNA extraction, Pacbio sequencing and phylogenetic analyses of the oral microbial communities. Results: 876 species derived from 13 known bacterial phyla and 110 genera were detected from 41 children using Pacbio sequencing. At the species level, 38 species, including Veillonella spp., Streptococcus spp., Prevotella spp. and Lactobacillus spp., showed higher abundance in the caries group compared to the caries-free group (p<0.05). The core microbiota at the genus and species levels was more stable in the caries-free micro-ecological niche. At follow-up, oral examinations 6 months after sample collection, development of new dental caries was observed in 5 children (the transitional group) among the 21 caries free children. Compared with the caries-free children, in the transitional and caries groups, 6 species, which were more abundant in the caries-free group, exhibited a relatively low abundance in both the caries group and the transitional group (p<0.05). We conclude that Abiotrophia spp., Neisseria spp. and Veillonella spp., are essential for maintaining a healthy oral microbial ecosystem. Prevotella spp., Lactobacillus spp., Dialister spp. and Filifactor spp. may be related to the pathogenesis and progression of dental caries.


September 22, 2019

Comprehensive exploration of the rumen microbial ecosystem with advancements in metagenomics

Ruminant farming and its environmental impact has long remained an economic concern. Metagenomics unravel the vast structural and functional diversity of the rumen microbial community that plays a major role in animal nutrition. Hereby, we summarize rumen metagenomic studies that have enhanced the knowledge of rumen microbe dynamics subsequently leading to development of better feed strategies to improve livestock production and reduce methane emissions.


September 22, 2019

Exploring the genome and transcriptome of the cave nectar bat Eonycteris spelaea with PacBio long-read sequencing.

In the past two decades, bats have emerged as an important model system to study host-pathogen interactions. More recently, it has been shown that bats may also serve as a new and excellent model to study aging, inflammation, and cancer, among other important biological processes. The cave nectar bat or lesser dawn bat (Eonycteris spelaea) is known to be a reservoir for several viruses and intracellular bacteria. It is widely distributed throughout the tropics and subtropics from India to Southeast Asia and pollinates several plant species, including the culturally and economically important durian in the region. Here, we report the whole-genome and transcriptome sequencing, followed by subsequent de novo assembly, of the E. spelaea genome solely using the Pacific Biosciences (PacBio) long-read sequencing platform.The newly assembled E. spelaea genome is 1.97 Gb in length and consists of 4,470 sequences with a contig N50 of 8.0 Mb. Identified repeat elements covered 34.65% of the genome, and 20,640 unique protein-coding genes with 39,526 transcripts were annotated.We demonstrated that the PacBio long-read sequencing platform alone is sufficient to generate a comprehensive de novo assembled genome and transcriptome of an important bat species. These results will provide useful insights and act as a resource to expand our understanding of bat evolution, ecology, physiology, immunology, viral infection, and transmission dynamics.


September 22, 2019

Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data.

The use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens.Here we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity.Clinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at: http://sourceforge.net/projects/pathoscope/.


September 22, 2019

A microbial clock provides an accurate estimate of the postmortem interval in a mouse model system.

Establishing the time since death is critical in every death investigation, yet existing techniques are susceptible to a range of errors and biases. For example, forensic entomology is widely used to assess the postmortem interval (PMI), but errors can range from days to months. Microbes may provide a novel method for estimating PMI that avoids many of these limitations. Here we show that postmortem microbial community changes are dramatic, measurable, and repeatable in a mouse model system, allowing PMI to be estimated within approximately 3 days over 48 days. Our results provide a detailed understanding of bacterial and microbial eukaryotic ecology within a decomposing corpse system and suggest that microbial community data can be developed into a forensic tool for estimating PMI. DOI:http://dx.doi.org/10.7554/eLife.01104.001.


September 22, 2019

Evolution of selective-sequencing approaches for virus discovery and virome analysis.

Recent advances in sequencing technologies have transformed the field of virus discovery and virome analysis. Once mostly confined to the traditional Sanger sequencing based individual virus discovery, is now entirely replaced by high throughput sequencing (HTS) based virus metagenomics that can be used to characterize the nature and composition of entire viromes. To better harness the potential of HTS for the study of viromes, sample preparation methodologies use different approaches to exclude amplification of non-viral components that can overshadow low-titer viruses. These virus-sequence enrichment approaches mostly focus on the sample preparation methods, like enzymatic digestion of non-viral nucleic acids and size exclusion of non-viral constituents by column filtration, ultrafiltration or density gradient centrifugation. However, recently a new approach of virus-sequence enrichment called virome-capture sequencing, focused on the amplification or HTS library preparation stage, was developed to increase the ability of virome characterization. This new approach has the potential to further transform the field of virus discovery and virome analysis, but its technical complexity and sequence-dependence warrants further improvements. In this review we discuss the different methods, their applications and evolution, for selective sequencing based virome analysis and also propose refinements needed to harness the full potential of HTS for virome analysis. Copyright © 2017 Elsevier B.V. All rights reserved.


September 22, 2019

The methylome of the gut microbiome: disparate Dam methylation patterns in intestinal Bacteroides dorei

Despite the large interest in the human microbiome in recent years, there are no reports of bacterial DNA methylation in the microbiome. Here metagenomic sequencing using the Pacific Biosciences platform allowed for rapid identification of bacterial GATC methylation status of a bacterial species in human stool samples. For this work, two stool samples were chosen that were dominated by a single species, Bacteroides dorei. Based on 16S rRNA analysis, this species represented over 45% of the bacteria present in these two samples. The B. dorei genome sequence from these samples was determined and the GATC methylation sites mapped. The Bacteroides dorei genome from one subject lacked any GATC methylation and lacked the DNA adenine methyltransferase genes. In contrast, B. dorei from another subject contained 20,551 methylated GATC sites. Of the 4970 open reading frames identified in the GATC methylated B. dorei genome, 3184 genes were methylated as well as 1735 GATC methylations in intergenic regions. These results suggest that DNA methylation patterns are important to consider in multi-omic analyses of microbiome samples seeking to discover the diversity of bacterial functions and may differ between disease states.


September 22, 2019

Global dissection of alternative splicing uncovers transcriptional diversity in tissues and associates with the flavonoid pathway in tea plant (Camellia sinensis).

Alternative splicing (AS) regulates mRNA at the post-transcriptional level to change gene function in organisms. However, little is known about the AS and its roles in tea plant (Camellia sinensis), widely cultivated for making a popular beverage tea.In our study, the AS landscape and dynamics were characterized in eight tissues (bud, young leaf, summer mature leaf, winter old leaf, stem, root, flower, fruit) of tea plant by Illumina RNA-Seq and confirmed by Iso-Seq. The most abundant AS (~?20%) was intron retention and involved in RNA processes. The some alternative splicings were found to be tissue specific in stem and root etc. Thirteen co-expressed modules of AS transcripts were identified, which revealed a similar pattern between the bud and young leaves as well as a distinct pattern between seasons. AS events of structural genes including anthocyanidin reductase and MYB transcription factors were involved in biosynthesis of flavonoid, especially in vegetative tissues. The AS isoforms rather than the full-length ones were the major transcripts involved in flavonoid synthesis pathway, and is positively correlated with the catechins content conferring the tea taste. We propose that the AS is an important functional mechanism in regulating flavonoid metabolites.Our study provides the insight into the AS events underlying tea plant’s uniquely different developmental process and highlights the important contribution and efficacy of alternative splicing regulatory function to biosynthesis of flavonoids.


September 22, 2019

ABC transporter mis-splicing associated with resistance to Bt toxin Cry2Ab in laboratory- and field-selected pink bollworm.

Evolution of pest resistance threatens the benefits of genetically engineered crops that produce Bacillus thuringiensis (Bt) insecticidal proteins. Strategies intended to delay pest resistance are most effective when implemented proactively. Accordingly, researchers have selected for and analyzed resistance to Bt toxins in many laboratory strains of pests before resistance evolves in the field, but the utility of this approach depends on the largely untested assumption that laboratory- and field-selected resistance to Bt toxins are similar. Here we compared the genetic basis of resistance to Bt toxin Cry2Ab, which is widely deployed in transgenic crops, between laboratory- and field-selected populations of the pink bollworm (Pectinophora gossypiella), a global pest of cotton. We discovered that resistance to Cry2Ab is associated with mutations disrupting the same ATP-binding cassette transporter gene (PgABCA2) in a laboratory-selected strain from Arizona, USA, and in field-selected populations from India. The most common mutation, loss of exon 6 caused by alternative splicing, occurred in resistant larvae from both locations. Together with previous data, the results imply that mutations in the same gene confer Bt resistance in laboratory- and field-selected strains and suggest that focusing on ABCA2 genes may help to accelerate progress in monitoring and managing resistance to Cry2Ab.


September 22, 2019

Uncovering full-length transcript isoforms of sugarcane cultivar Khon Kaen 3 using single-molecule long-read sequencing.

Sugarcane is an important global food crop and energy resource. To facilitate the sugarcane improvement program, genome and gene information are important for studying traits at the molecular level. Most currently available transcriptome data for sugarcane were generated using second-generation sequencing platforms, which provide short reads. The de novo assembled transcripts from these data are limited in length, and hence may be incomplete and inaccurate, especially for long RNAs.We generated a transcriptome dataset of leaf tissue from a commercial Thai sugarcane cultivar Khon Kaen 3 (KK3) using PacBio RS II single-molecule long-read sequencing by the Iso-Seq method. Short-read RNA-Seq data were generated from the same RNA sample using the Ion Proton platform for reducing base calling errors.A total of 119,339 error-corrected transcripts were generated with the N50 length of 3,611 bp, which is on average longer than any previously reported sugarcane transcriptome dataset. 110,253 sequences (92.4%) contain an open reading frame (ORF) of at least 300 bp long with ORF N50 of 1,416 bp. The mean lengths of 5′ and 3′ untranslated regions in 73,795 sequences with complete ORFs are 1,249 and 1,187 bp, respectively. 4,774 transcripts are putatively novel full-length transcripts which do not match with a previous Iso-Seq study of sugarcane. We annotated the functions of 68,962 putative full-length transcripts with at least 90% coverage when compared with homologous protein coding sequences in other plants.The new catalog of transcripts will be useful for genome annotation, identification of splicing variants, SNP identification, and other research pertaining to the sugarcane improvement program. The putatively novel transcripts suggest unique features of KK3, although more data from different tissues and stages of development are needed to establish a reference transcriptome of this cultivar.


September 22, 2019

Genome-wide transcriptome profiling of the medicinal plant Zanthoxylum planispinum using a single-molecule direct RNA sequencing approach.

High-throughput RNA sequencing has revolutionized transcriptome-based studies of candidate genes, key pathways and gene regulation in non-model organisms. We analyzed full-length cDNA sequences in Zanthoxylum planispinum (Z. planispinum), a medicinal herb in major parts of East Asia. The full-length mRNA derived from tissues of leaf, early fruit and maturing fruit stage were sequenced using PacBio RSII platform to identify isoform transcriptome. We obtained 51,402 unigenes, with average 1781?bp per gene in 82.473?Mb gene lengths. Among 51,402, 3963 unigenes showed variety of isoform. By selection of one representative gene among each of the various isoforms, we finalized 46,306 unique gene set for this herb. We identified 76 cytochrome P450 (CYP450) and related isoforms that are of the wide diversity in the molecular function and biological process. These transcriptome data of Z. planispinum will provide a good resource to study metabolic engineering for the production of valuable medicinal drugs and phytochemicals. Copyright © 2018. Published by Elsevier Inc.


September 22, 2019

Survey of Ixodes pacificus ticks in California reveals a diversity of microorganisms and a novel and widespread Anaplasmataceae species.

Ixodes pacificus ticks can harbor a wide range of human and animal pathogens. To survey the prevalence of tick-borne known and putative pathogens, we tested 982 individual adult and nymphal I. pacificus ticks collected throughout California between 2007 and 2009 using a broad-range PCR and electrospray ionization mass spectrometry (PCR/ESI-MS) assay designed to detect a wide range of tick-borne microorganisms. Overall, 1.4% of the ticks were found to be infected with Borrelia burgdorferi, 2.0% were infected with Borrelia miyamotoi and 0.3% were infected with Anaplasma phagocytophilum. In addition, 3.0% were infected with Babesia odocoilei. About 1.2% of the ticks were co-infected with more than one pathogen or putative pathogen. In addition, we identified a novel Anaplasmataceae species that we characterized by sequencing of its 16S rRNA, groEL, gltA, and rpoB genes. Sequence analysis indicated that this organism is phylogenetically distinct from known Anaplasma species with its closest genetic near neighbors coming from Asia. The prevalence of this novel Anaplasmataceae species was as high as 21% at one site, and it was detected in 4.9% of ticks tested statewide. Based upon this genetic characterization we propose that this organism be called ‘Candidatus Cryptoplasma californiense’. Knowledge of this novel microbe will provide awareness for the community about the breadth of the I. pacificus microbiome, the concept that this bacterium could be more widely spread; and an opportunity to explore whether this bacterium also contributes to human or animal disease burden.


September 22, 2019

Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts.

Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34 719 high-quality annotations. Further BLASTn analysis against NCBI non-redundant nucleotide sequences, Coffea canephora coding sequences with UTR, C. arabica ESTs, and Rfam resulted in 1213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5?UTRs, facilitating the identification of upstream open reading frames. The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10 kilo base) were poorly annotated. LRS technology shows the limitation of previous studies. It provides an important tool to produce a reference transcriptome including more of the diversity of full-length transcripts to help understand the biology and support the genetic improvement of polyploid species such as coffee.© The Authors 2017. Published by Oxford University Press.


September 22, 2019

Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering

BACKGROUND: High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling. However, clustering of short 16S rRNA gene reads into biologically meaningful OTUs is challenging, in part because nucleotide variation along the 16S rRNA gene is only partially captured by short reads. The recent emergence of long-read platforms, such as single-molecule real-time (SMRT) sequencing from Pacific Biosciences, offers the potential for improved taxonomic and phylogenetic profiling. Here, we evaluate the performance of long- and short-read 16S rRNA gene sequencing using simulated and experimental data, followed by OTU inference using computational pipelines based on heuristic and complete-linkage hierarchical clustering. RESULTS: In simulated data, long-read sequencing was shown to improve OTU quality and decrease variance. We then profiled 40 human gut microbiome samples using a combination of Illumina MiSeq and Blautia-specific SMRT sequencing, further supporting the notion that long reads can identify additional OTUs. We implemented a complete-linkage hierarchical clustering strategy using a flexible computational pipeline, tailored specifically for PacBio circular consensus sequencing (CCS) data that outperforms heuristic methods in most settings: https://github.com/oscar-franzen/oclust/. CONCLUSION: Our data demonstrate that long reads can improve OTU inference; however, the choice of clustering algorithm and associated clustering thresholds has significant impact on performance.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.