Menu
September 22, 2019

Accurate determination of bacterial abundances in human metagenomes using full-length 16S sequencing reads

DNA sequencing of PCR-amplified marker genes, especially but not limited to the 16S rRNA gene, is perhaps the most common approach for profiling microbial communities. Due to technological constraints of commonly available DNA sequencing, these approaches usually take the form of short reads sequenced from a narrow, targeted variable region, with a corresponding loss of taxonomic resolution relative to the full length marker gene. We use Pacific Biosciences single-molecule, real-time circular consensus sequencing to sequence amplicons spanning the entire length of the 16S rRNA gene. However, this sequencing technology suffers from high sequencing error rate that needs to be addressed in order to take full advantage of the longer sequence. Here, we present a method to model the sequencing error process using a generalized pair hidden Markov chain model and estimate bacterial abundances in microbial samples. We demonstrate, with simulated and real data, that our model and its associated estimation procedure are able to give accurate estimates at the species (or subspecies) level, and is more flexible than existing methods like SImple Non-Bayesian TAXonomy (SINTAX).


September 22, 2019

Transcriptional fates of human-specific segmental duplications in brain.

Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.© 2018 Dougherty et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing.

It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.


September 22, 2019

Investigating bacterial population structure and dynamics in traditional koumiss from Inner Mongolia using single molecule real-time sequencing.

Koumiss is considered as a complete dairy product high in nutrients and with medicinal properties. The bacterial communities involved in production of koumiss play a crucial role in the fermentation cycle. To reveal bacterial biodiversity in koumiss and the dynamics of succession in bacterial populations during fermentation, 22 samples were collected from 5 sampling sites and the full length of the 16S ribosomal RNA genes sequenced using single molecule real-time sequencing technology. One hundred forty-eight species were identified from 82 bacterial genera and 8 phyla. These results suggested that the structural difference in the bacterial community could be attributed to geographical location. The most significant difference in bacterial composition occurred in samples from group D compared with other groups. The sampling location of group D was distant from the city and maintained the primitive local nomadic life. The dynamics of succession in bacterial communities showed that Lactobacillus helveticus increased in abundance from 0 to 9h and reached its peak at 9h and then decreased. In contrast, Enterococcus faecalis, Enterococcus durans, and Enterococcus casseliflavus increased gradually throughout the fermentation process, and reached a maximum after 24h. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.


September 22, 2019

Active microorganisms in forest soils differ from the total community yet are shaped by the same environmental factors: the influence of pH and soil moisture.

Predicting the impact of environmental change on soil microbial functions requires an understanding of how environmental factors shape microbial composition. Here, we investigated the influence of environmental factors on bacterial and fungal communities across an expanse of northern hardwood forest in Michigan, USA, which spans a 500-km regional climate gradient. We quantified soil microbial community composition using high-throughput DNA sequencing on coextracted rDNA (i.e. total community) and rRNA (i.e. active community). Within both bacteria and fungi, total and active communities were compositionally distinct from one another across the regional gradient (bacteria P = 0.01; fungi P < 0.01). Taxonomically, the active community was a subset of the total community. Compositional differences between total and active communities reflected changes in the relative abundance of dominant taxa. The composition of both the total and active microbial communities varied by site across the gradient (P < 0.01) and was shaped by differences in soil moisture, pH, SOM carboxyl content, as well as C and N concentration. Our study highlights the importance of distinguishing between metabolically active microorganisms and the total community, and emphasizes that the same environmental factors shape the total and active communities of bacteria and fungi in this ecosystem.© FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.


September 22, 2019

Interaction between the microbiome and TP53 in human lung cancer.

Lung cancer is the leading cancer diagnosis worldwide and the number one cause of cancer deaths. Exposure to cigarette smoke, the primary risk factor in lung cancer, reduces epithelial barrier integrity and increases susceptibility to infections. Herein, we hypothesize that somatic mutations together with cigarette smoke generate a dysbiotic microbiota that is associated with lung carcinogenesis. Using lung tissue from 33 controls and 143 cancer cases, we conduct 16S ribosomal RNA (rRNA) bacterial gene sequencing, with RNA-sequencing data from lung cancer cases in The Cancer Genome Atlas serving as the validation cohort.Overall, we demonstrate a lower alpha diversity in normal lung as compared to non-tumor adjacent or tumor tissue. In squamous cell carcinoma specifically, a separate group of taxa are identified, in which Acidovorax is enriched in smokers. Acidovorax temporans is identified within tumor sections by fluorescent in situ hybridization and confirmed by two separate 16S rRNA strategies. Further, these taxa, including Acidovorax, exhibit higher abundance among the subset of squamous cell carcinoma cases with TP53 mutations, an association not seen in adenocarcinomas.The results of this comprehensive study show both microbiome-gene and microbiome-exposure interactions in squamous cell carcinoma lung cancer tissue. Specifically, tumors harboring TP53 mutations, which can impair epithelial function, have a unique bacterial consortium that is higher in relative abundance in smoking-associated tumors of this type. Given the significant need for clinical diagnostic tools in lung cancer, this study may provide novel biomarkers for early detection.


September 22, 2019

Major histocompatibility complex haplotyping and long-amplicon allele discovery in cynomolgus macaques from Chinese breeding facilities.

Very little is currently known about the major histocompatibility complex (MHC) region of cynomolgus macaques (Macaca fascicularis; Mafa) from Chinese breeding centers. We performed comprehensive MHC class I haplotype analysis of 100 cynomolgus macaques from two different centers, with animals from different reported original geographic origins (Vietnamese, Cambodian, and Cambodian/Indonesian mixed-origin). Many of the samples were of known relation to each other (sire, dam, and progeny sets), making it possible to characterize lineage-level haplotypes in these animals. We identified 52 Mafa-A and 74 Mafa-B haplotypes in this cohort, many of which were restricted to specific sample origins. We also characterized full-length MHC class I transcripts using Pacific Biosciences (PacBio) RS II single-molecule real-time (SMRT) sequencing. This technology allows for complete read-through of unfragmented MHC class I transcripts (~1100 bp in length), so no assembly is required to unambiguously resolve novel full-length sequences. Overall, we identified 311 total full-length transcripts in a subset of 72 cynomolgus macaques from these Chinese breeding facilities; 130 of these sequences were novel and an additional 115 extended existing short database sequences to span the complete open reading frame. This significantly expands the number of Mafa-A, Mafa-B, and Mafa-I full-length alleles in the official cynomolgus macaque MHC class I database. The PacBio technique described here represents a general method for full-length allele discovery and genotyping that can be extended to other complex immune loci such as MHC class II, killer immunoglobulin-like receptors, and Fc gamma receptors.


September 22, 2019

Microsatellites from Fosterella christophii (Bromeliaceae) by de novo transcriptome sequencing on the Pacific Biosciences RS platform.

Microsatellite markers were developed in Fosterella christophii (Bromeliaceae) to investigate the genetic diversity and population structure within the F. micrantha group, comprising F. christophii, F. micrantha, and F. villosula.Full-length cDNAs were isolated from F. christophii and sequenced on a Pacific Biosciences RS platform. A total of 1590 high-quality consensus isoforms were assembled into 971 unigenes containing 421 perfect microsatellites. Thirty primer sets were designed, of which 13 revealed a high level of polymorphism in three populations of F. christophii, with four to nine alleles per locus. Each of these 13 loci cross-amplified in the closely related species F. micrantha and F. villosula, with one to six and one to 11 alleles per locus, respectively.The new markers are promising tools to study the population genetics of F. christophii and to discover species boundaries within the F. micrantha group.


September 22, 2019

Transcriptome profiling of two ornamental and medicinal papaver herbs.

The Papaver spp. (Papaver rhoeas (Corn poppy) and Papaver nudicaule (Iceland poppy)) genera are ornamental and medicinal plants that are used for the isolation of alkaloid drugs. In this study, we generated 700 Mb of transcriptome sequences with the PacBio platform. They were assembled into 120,926 contigs, and 1185 (82.2%) of the benchmarking universal single-copy orthologs (BUSCO) core genes were completely present in our assembled transcriptome. Furthermore, using 128 Gb of Illumina sequences, the transcript expression was assessed at three stages of Papaver plant development (30, 60, and 90 days), from which we identified 137 differentially expressed transcripts. Furthermore, three co-occurrence heat maps are generated from 51 different plant genomes along with the Papaver transcriptome, i.e., secondary metabolite biosynthesis, isoquinoline alkaloid biosynthesis (BIA) pathway, and cytochrome. Sixty-nine transcripts in the BIA pathway along with 22 different alkaloids (quantified with LC-QTOF-MS/MS) were mapped into the BIA KEGG map (map00950). Finally, we identified 39 full-length cytochrome transcripts and compared them with other genomes. Collectively, this transcriptome data, along with the expression and quantitative metabolite profiles, provides an initial recording of secondary metabolites and their expression related to Papaver plant development. Moreover, these profiles could help to further detail the functional characterization of the various secondary metabolite biosynthesis and Papaver plant development associated problems.


September 22, 2019

Isoform sequencing provides insight into natural genetic diversity in maize.

W64A, as a member of non-stiff stalk maize, has been used to develop current corn in plant breeding, and serving as one of broadest parent line for the commercial hybrid seed production (Huffman, 1984). The inbred had the characteristics of early flowering, average plant and ear height at its maturity, very strong roots and good stalks (Runge, 2004). In addition, W64A serves as an invaluable germplasm to study gene functions especially in the field of corn nutrition and endosperm texture given its nearly complete vitreousness and hardness (Figure 1a). However, little is known about the background of genetic and genomic information for W64A. With the advent of the revolutionary technology of PacBio long-read sequencing, we can simultaneously obtain a large amount of full-length cDNA up to 20 kb (An et al., 2018). This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.


September 22, 2019

Evaluation of bacterial contamination in raw milk, ultra-high temperature milk and infant formula using single molecule, real-time sequencing technology.

The Pacific Biosciences (Menlo Park, CA) single molecule, real-time sequencing technology (SMRT) was reported to have some advantages in analyzing the bacterial profile of environmental samples. In this study, the presence of bacterial contaminants in raw milk, UHT milk, and infant formula was determined by SMRT sequencing of the full length 16S rRNA gene. The bacterial profiles obtained at different taxonomic levels revealed clear differences in bacterial community structure across the 16 analyzed dairy samples. No indicative pathogenic bacteria were found in any of these tested samples. However, some of the detected bacterial species (e.g., Bacillus cereus, Enterococcus casseliflavus, and Enterococcus gallinarum) might potentially relate with product quality defects and bacterial antibiotic gene transfer. Although only a limited number of dairy samples were analyzed here, our data have demonstrated for the first time the feasibility of using the SMRT sequencing platform in detecting bacterial contamination. Our paper also provides interesting reference information for future development of new precautionary strategies for controlling the dairy safety in large-scale industrialized production lines. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.


September 22, 2019

Improving eukaryotic genome annotation using single molecule mRNA sequencing.

The advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy. Here we use these attributes to improve on the genome annotation of the parasitic hookworm Ancylostoma ceylanicum using PacBio RNA-Seq.We sequenced 192,888 circular consensus sequences (CCS) derived from cDNAs generated using the CloneTech SMARTer system. These SMARTer-SMRT libraries were normalized and size-selected providing a robust population of expressed structural genes for subsequent genome annotation. We demonstrate PacBio mRNA sequences based genome annotation improvement, compared to genome annotation using conventional sequencing-by-synthesis alone, by identifying 1609 (9.2%) new genes, extended the length of 3965 (26.7%) genes and increased the total genomic exon length by 1.9 Mb (12.4%). Non-coding sequence representation (primarily from UTRs based on dT reverse transcription priming) was particularly improved, increasing in total length by fifteen-fold, by increasing both the length and number of UTR exons. In addition, the UTR data provided by these CCS allowed for the identification of a novel SL2 splice leader sequence for A. ceylanicum and an increase in the number and proportion of functionally annotated genes. RNA-seq data also confirmed some of the newly annotated genes and gene features.Overall, PacBio data has supported a significant improvement in gene annotation in this genome, and is an appealing alternative or complementary technique for genome annotation to the other transcript sequencing technologies.


September 22, 2019

Transcriptome-wide investigation of circular RNAs in rice.

Various stable circular RNAs (circRNAs) are newly identified to be the abundance of noncoding RNAs in Archaea, Caenorhabditis elegans, mice, and humans through high-throughput deep sequencing coupled with analysis of massive transcriptional data. CircRNAs play important roles in miRNA function and transcriptional controlling by acting as competing endogenous RNAs or positive regulators on their parent coding genes. However, little is known regarding circRNAs in plants. Here, we report 2354 rice circRNAs that were identified through deep sequencing and computational analysis of ssRNA-seq data. Among them, 1356 are exonic circRNAs. Some circRNAs exhibit tissue-specific expression. Rice circRNAs have a considerable number of isoforms, including alternative backsplicing and alternative splicing circularization patterns. Parental genes with multiple exons are preferentially circularized. Only 484 circRNAs have backsplices derived from known splice sites. In addition, only 92 circRNAs were found to be enriched for miniature inverted-repeat transposable elements (MITEs) in flanking sequences or to be complementary to at least 18-bp flanking intronic sequences, indicating that there are some other production mechanisms in addition to direct backsplicing in rice. Rice circRNAs have no significant enrichment for miRNA target sites. A transgenic study showed that overexpression of a circRNA construct could reduce the expression level of its parental gene in transgenic plants compared with empty-vector control plants. This suggested that circRNA and its linear form might act as a negative regulator of its parental gene. Overall, these analyses reveal the prevalence of circRNAs in rice and provide new biological insights into rice circRNAs.© 2015 Lu et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.