Menu
September 22, 2019

Clonal distribution of BCR-ABL1 mutations and splice isoforms by single-molecule long-read RNA sequencing.

The evolution of mutations in the BCR-ABL1 fusion gene transcript renders CML patients resistant to tyrosine kinase inhibitor (TKI) based therapy. Thus screening for BCR-ABL1 mutations is recommended particularly in patients experiencing poor response to treatment. Herein we describe a novel approach for the detection and surveillance of BCR-ABL1 mutations in CML patients.To detect mutations in the BCR-ABL1 transcript we developed an assay based on the Pacific Biosciences (PacBio) sequencing technology, which allows for single-molecule long-read sequencing of BCR-ABL1 fusion transcript molecules. Samples from six patients with poor response to therapy were analyzed both at diagnosis and follow-up. cDNA was generated from total RNA and a 1,6 kb fragment encompassing the BCR-ABL1 transcript was amplified using long range PCR. To estimate the sensitivity of the assay, a serial dilution experiment was performed.Over 10,000 full-length BCR-ABL1 sequences were obtained for all samples studied. Through the serial dilution analysis, mutations in CML patient samples could be detected down to a level of at least 1%. Notably, the assay was determined to be sufficiently sensitive even in patients harboring a low abundance of BCR-ABL1 levels. The PacBio sequencing successfully identified all mutations seen by standard methods. Importantly, we identified several mutations that escaped detection by the clinical routine analysis. Resistance mutations were found in all but one of the patients. Due to the long reads afforded by PacBio sequencing, compound mutations present in the same molecule were readily distinguished from independent alterations arising in different molecules. Moreover, several transcript isoforms of the BCR-ABL1 transcript were identified in two of the CML patients. Finally, our assay allowed for a quick turn around time allowing samples to be reported upon within 2 days.In summary the PacBio sequencing assay can be applied to detect BCR-ABL1 resistance mutations in both diagnostic and follow-up CML patient samples using a simple protocol applicable to routine diagnosis. The method besides its sensitivity, gives a complete view of the clonal distribution of mutations, which is of importance when making therapy decisions.


September 22, 2019

The first whole transcriptomic exploration of pre-oviposited early chicken embryos using single and bulked embryonic RNA-sequencing.

The chicken is a valuable model organism, especially in evolutionary and embryology research because its embryonic development occurs in the egg. However, despite its scientific importance, no transcriptome data have been generated for deciphering the early developmental stages of the chicken because of practical and technical constraints in accessing pre-oviposited embryos.Here, we determine the entire transcriptome of pre-oviposited avian embryos, including oocyte, zygote, and intrauterine embryos from Eyal-giladi and Kochav stage I (EGK.I) to EGK.X collected using a noninvasive approach for the first time. We also compare RNA-sequencing data obtained using a bulked embryo sequencing and single embryo/cell sequencing technique. The raw sequencing data were preprocessed with two genome builds, Galgal4 and Galgal5, and the expression of 17,108 and 26,102 genes was quantified in the respective builds. There were some differences between the two techniques, as well as between the two genome builds, and these were affected by the emergence of long intergenic noncoding RNA annotations.The first transcriptome datasets of pre-oviposited early chicken embryos based on bulked and single embryo sequencing techniques will serve as a valuable resource for investigating early avian embryogenesis, for comparative studies among vertebrates, and for novel gene annotation in the chicken genome.


September 22, 2019

Circular RNA architecture and differentiation during leaf bud to young leaf development in tea (Camellia sinensis).

Circular RNA (circRNA) discovery, expression patterns and experimental validation in developing tea leaves indicates its correlation with circRNA-parental genes and potential roles in ceRNA interaction network. Circular RNAs (circRNAs) have recently emerged as a novel class of abundant endogenous stable RNAs produced by circularization with regulatory potential. However, identification of circRNAs in plants, especially in non-model plants with large genomes, is challenging. In this study, we undertook a systematic identification of circRNAs from different stage tissues of tea plant (Camellia sinensis) leaf development using rRNA-depleted circular RNA-seq. By combining two state-of-the-art detecting tools, we characterized 3174 circRNAs, of which 342 were shared by each approach, and thus considered high-confidence circRNAs. A few predicted circRNAs were randomly chosen, and 20 out of 24 were experimental confirmed by PCR and Sanger sequencing. Similar in other plants, tissue-specific expression was also observed for many C. sinensis circRNAs. In addition, we found that circRNA abundances were positively correlated with the mRNA transcript abundances of their parental genes. qRT-PCR validated the differential expression patterns of circRNAs between leaf bud and young leaf, which also indicated the low expression abundance of circRNAs compared to the standard mRNAs from the parental genes. We predicted the circRNA-microRNA interaction networks, and 54 of the differentially expressed circRNAs were found to have potential tea plant miRNA binding sites. The gene sets encoding circRNAs were significantly enriched in chloroplasts related GO terms and photosynthesis/metabolites biosynthesis related KEGG pathways, suggesting the candidate roles of circRNAs in photosynthetic machinery and metabolites biosynthesis during leaf development.


September 22, 2019

Interpreting microbial biosynthesis in the genomic age: Biological and practical considerations.

Genome mining has become an increasingly powerful, scalable, and economically accessible tool for the study of natural product biosynthesis and drug discovery. However, there remain important biological and practical problems that can complicate or obscure biosynthetic analysis in genomic and metagenomic sequencing projects. Here, we focus on limitations of available technology as well as computational and experimental strategies to overcome them. We review the unique challenges and approaches in the study of symbiotic and uncultured systems, as well as those associated with biosynthetic gene cluster (BGC) assembly and product prediction. Finally, to explore sequencing parameters that affect the recovery and contiguity of large and repetitive BGCs assembled de novo, we simulate Illumina and PacBio sequencing of the Salinispora tropica genome focusing on assembly of the salinilactam (slm) BGC.


September 22, 2019

Microsatellites from Fosterella christophii (Bromeliaceae) by de novo transcriptome sequencing on the Pacific Biosciences RS platform.

Microsatellite markers were developed in Fosterella christophii (Bromeliaceae) to investigate the genetic diversity and population structure within the F. micrantha group, comprising F. christophii, F. micrantha, and F. villosula.Full-length cDNAs were isolated from F. christophii and sequenced on a Pacific Biosciences RS platform. A total of 1590 high-quality consensus isoforms were assembled into 971 unigenes containing 421 perfect microsatellites. Thirty primer sets were designed, of which 13 revealed a high level of polymorphism in three populations of F. christophii, with four to nine alleles per locus. Each of these 13 loci cross-amplified in the closely related species F. micrantha and F. villosula, with one to six and one to 11 alleles per locus, respectively.The new markers are promising tools to study the population genetics of F. christophii and to discover species boundaries within the F. micrantha group.


September 22, 2019

Cow-to-mouse fecal transplantations suggest intestinal microbiome as one cause of mastitis.

Mastitis, which affects nearly all lactating mammals including human, is generally thought to be caused by local infection of the mammary glands. For treatment, antibiotics are commonly prescribed, which however are of concern in both treatment efficacy and neonate safety. Here, using bovine mastitis which is the most costly disease in the dairy industry as a model, we showed that intestinal microbiota alone can lead to mastitis.Fecal microbiota transplantation (FMT) from mastitis, but not healthy cows, to germ-free (GF) mice resulted in mastitis symptoms in mammary gland and inflammations in serum, spleen, and colon. Probiotic intake in parallel with FMT from diseased cows led to relieved mastitis symptoms in mice, by shifting the murine intestinal microbiota to a state that is functionally distinct from either healthy or diseased microbiota yet structurally similar to the latter. Despite conservation in mastitis symptoms, diseased cows and mice shared few mastitis-associated bacterial organismal or functional markers, suggesting striking divergence in mastitis-associated intestinal microbiota among lactating mammals. Moreover, an “amplification effect” of disease-health distinction in both microbiota structure and function was apparent during the cow-to-mouse FMT.Hence, dysbiosis of intestinal microbiota may be one cause of mastitis, and probiotics that restore intestinal microbiota function are an effective and safe strategy to treat mastitis.


September 22, 2019

Long-read sequencing of chicken transcripts and identification of new transcript isoforms.

The chicken has long served as an important model organism in many fields, and continues to aid our understanding of animal development. Functional genomics studies aimed at probing the mechanisms that regulate development require high-quality genomes and transcript annotations. The quality of these resources has improved dramatically over the last several years, but many isoforms and genes have yet to be identified. We hope to contribute to the process of improving these resources with the data presented here: a set of long cDNA sequencing reads, and a curated set of new genes and transcript isoforms not currently represented in the most up-to-date genome annotation currently available to the community of researchers who rely on the chicken genome.


September 22, 2019

Caught in the middle with multiple displacement amplification: the myth of pooling for avoiding multiple displacement amplification bias in a metagenome.

Shotgun metagenomics has become an important tool for investigating the ecology of microorganisms. Underlying these investigations is the assumption that metagenome sequence data accurately estimates the census of microbial populations. Multiple displacement amplification (MDA) of microbial community DNA is often used in cases where it is difficult to obtain enough DNA for sequencing; however, MDA can result in amplification biases that may impact subsequent estimates of population census from metagenome data. Some have posited that pooling replicate MDA reactions negates these biases and restores the accuracy of population analyses. This assumption has not been empirically tested.Using mock viral communities, we examined the influence of pooling on population-scale analyses. In pooled and single reaction MDA treatments, sequence coverage of viral populations was highly variable and coverage patterns across viral genomes were nearly identical, indicating that initial priming biases were reproducible and that pooling did not alleviate biases. In contrast, control unamplified sequence libraries showed relatively even coverage across phage genomes.MDA should be avoided for metagenomic investigations that require quantitative estimates of microbial taxa and gene functional groups. While MDA is an indispensable technique in applications such as single-cell genomics, amplification biases cannot be overcome by combining replicate MDA reactions. Alternative library preparation techniques should be utilized for quantitative microbial ecology studies utilizing metagenomic sequencing approaches.


September 22, 2019

PCR and omics based techniques to study the diversity, ecology and biology of anaerobic fungi: Insights, challenges andopportunities.

Anaerobic fungi (phylum Neocallimastigomycota) are common inhabitants of the digestive tract of mammalian herbivores, and in the rumen, can account for up to 20% of the microbial biomass. Anaerobic fungi play a primary role in the degradation of lignocellulosic plant material. They also have a syntrophic interaction with methanogenic archaea, which increases their fiber degradation activity. To date, nine anaerobic fungal genera have been described, with further novel taxonomic groupings known to exist based on culture-independent molecular surveys. However, the true extent of their diversity may be even more extensively underestimated as anaerobic fungi continue being discovered in yet unexplored gut and non-gut environments. Additionally many studies are now known to have used primers that provide incomplete coverage of the Neocallimastigomycota. For ecological studies the internal transcribed spacer 1 region (ITS1) has been the taxonomic marker of choice, but due to various limitations the large subunit rRNA (LSU) is now being increasingly used. How the continued expansion of our knowledge regarding anaerobic fungal diversity will impact on our understanding of their biology and ecological role remains unclear; particularly as it is becoming apparent that anaerobic fungi display niche differentiation. As a consequence, there is a need to move beyond the broad generalization of anaerobic fungi as fiber-degraders, and explore the fundamental differences that underpin their ability to exist in distinct ecological niches. Application of genomics, transcriptomics, proteomics and metabolomics to their study in pure/mixed cultures and environmental samples will be invaluable in this process. To date the genomes and transcriptomes of several characterized anaerobic fungal isolates have been successfully generated. In contrast, the application of proteomics and metabolomics to anaerobic fungal analysis is still in its infancy. A central problem for all analyses, however, is the limited functional annotation of anaerobic fungal sequence data. There is therefore an urgent need to expand information held within publicly available reference databases. Once this challenge is overcome, along with improved sample collection and extraction, the application of these techniques will be key in furthering our understanding of the ecological role and impact of anaerobic fungi in the wide range of environments they inhabit.


September 22, 2019

Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells.

Full-length RNA sequencing (RNA-Seq) has been applied to bulk tissue, cell lines and sorted cells to characterize transcriptomes, but applying this technology to single cells has proven to be difficult, with less than ten single-cell transcriptomes having been analyzed thus far. Although single splicing events have been described for =200 single cells with statistical confidence, full-length mRNA analyses for hundreds of cells have not been reported. Single-cell short-read 3′ sequencing enables the identification of cellular subtypes, but full-length mRNA isoforms for these cell types cannot be profiled. We developed a method that starts with bulk tissue and identifies single-cell types and their full-length RNA isoforms without fluorescence-activated cell sorting. Using single-cell isoform RNA-Seq (ScISOr-Seq), we identified RNA isoforms in neurons, astrocytes, microglia, and cell subtypes such as Purkinje and Granule cells, and cell-type-specific combination patterns of distant splice sites. We used ScISOr-Seq to improve genome annotation in mouse Gencode version 10 by determining the cell-type-specific expression of 18,173 known and 16,872 novel isoforms.


September 22, 2019

Cartography of neurexin alternative splicing mapped by single-molecule long-read mRNA sequencing.

Neurexins are evolutionarily conserved presynaptic cell-adhesion molecules that are essential for normal synapse formation and synaptic transmission. Indirect evidence has indicated that extensive alternative splicing of neurexin mRNAs may produce hundreds if not thousands of neurexin isoforms, but no direct evidence for such diversity has been available. Here we use unbiased long-read sequencing of full-length neurexin (Nrxn)1a, Nrxn1ß, Nrxn2ß, Nrxn3a, and Nrxn3ß mRNAs to systematically assess how many sites of alternative splicing are used in neurexins with a significant frequency, and whether alternative splicing events at these sites are independent of each other. In sequencing more than 25,000 full-length mRNAs, we identified a novel, abundantly used alternatively spliced exon of Nrxn1a and Nrxn3a (referred to as alternatively spliced sequence 6) that encodes a 9-residue insertion in the flexible hinge region between the fifth LNS (laminin-a, neurexin, sex hormone-binding globulin) domain and the third EGF-like sequence. In addition, we observed several larger-scale events of alternative splicing that deleted multiple domains and were much less frequent than the canonical six sites of alternative splicing in neurexins. All of the six canonical events of alternative splicing appear to be independent of each other, suggesting that neurexins may exhibit an even larger isoform diversity than previously envisioned and comprise thousands of variants. Our data are consistent with the notion that a-neurexins represent extracellular protein-interaction scaffolds in which different LNS and EGF domains mediate distinct interactions that affect diverse functions and are independently regulated by independent events of alternative splicing.


September 22, 2019

Enigmatic Diphyllatea eukaryotes: culturing and targeted PacBio RS amplicon sequencing reveals a higher order taxonomic diversity and global distribution.

The class Diphyllatea belongs to a group of enigmatic unicellular eukaryotes that play a key role in reconstructing the morphological innovation and diversification of early eukaryotic evolution. Despite its evolutionary significance, very little is known about the phylogeny and species diversity of Diphyllatea. Only three species have described morphology, being taxonomically divided by flagella number, two or four, and cell size. Currently, one 18S rRNA Diphyllatea sequence is available, with environmental sequencing surveys reporting only a single partial sequence from a Diphyllatea-like organism. Accordingly, geographical distribution of Diphyllatea based on molecular data is limited, despite morphological data suggesting the class has a global distribution. We here present a first attempt to understand species distribution, diversity and higher order structure of Diphyllatea.We cultured 11 new strains, characterised these morphologically and amplified their rRNA for a combined 18S-28S rRNA phylogeny. We sampled environmental DNA from multiple sites and designed new Diphyllatea-specific PCR primers for long-read PacBio RSII technology. Near full-length 18S rRNA sequences from environmental DNA, in addition to supplementary Diphyllatea sequence data mined from public databases, resolved the phylogeny into three deeply branching and distinct clades (Diphy I – III). Of these, the Diphy III clade is entirely novel, and in congruence with Diphy II, composed of species morphologically consistent with the earlier described Collodictyon triciliatum. The phylogenetic split between the Diphy I and Diphy II?+?III clades corresponds with a morphological division of Diphyllatea into bi- and quadriflagellate cell forms.This altered flagella composition must have occurred early in the diversification of Diphyllatea and may represent one of the earliest known morphological transitions among eukaryotes. Further, the substantial increase in molecular data presented here confirms Diphyllatea has a global distribution, seemingly restricted to freshwater habitats. Altogether, the results reveal the advantage of combining a group-specific PCR approach and long-read high-throughput amplicon sequencing in surveying enigmatic eukaryote lineages. Lastly, our study shows the capacity of PacBio RS when targeting a protist class for increasing phylogenetic resolution.


September 22, 2019

Shannon: an information-optimal de novo RNA-Seq assembler

De novo assembly of short RNA-Seq reads into transcripts is challenging due to sequence similarities in transcriptomes arising from gene duplications and alternative splicing of transcripts. We present Shannon, an RNA-Seq assembler with an optimality guarantee derived from principles of information theory: Shannon reconstructs nearly all information-theoretically reconstructable transcripts. Shannon is based on a theory we develop for de novo RNA-Seq assembly that reveals differing abundances among transcripts to be the key, rather than the barrier, to effective assembly. The assembly problem is formulated as a sparsest-flow problem on a transcript graph, and the heart of Shannon is a novel iterative flow-decomposition algorithm. This algorithm provably solves the information-theoretically reconstructable instances in linear-time even though the general sparsest-flow problem is NP-hard. Shannon also incorporates several additional new algorithmic advances: a new error-correction algorithm based on successive cancelation, a multi-bridging algorithm that carefully utilizes read information in the k-mer de Bruijn graph, and an approximate graph partitioning algorithm to split the transcriptome de Bruijn graph into smaller components. In tests on large RNA-Seq datasets, Shannon obtains significant increases in sensitivity along with improvements in specificity in comparison to state-of-the-art assemblers.


September 22, 2019

Improving eukaryotic genome annotation using single molecule mRNA sequencing.

The advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy. Here we use these attributes to improve on the genome annotation of the parasitic hookworm Ancylostoma ceylanicum using PacBio RNA-Seq.We sequenced 192,888 circular consensus sequences (CCS) derived from cDNAs generated using the CloneTech SMARTer system. These SMARTer-SMRT libraries were normalized and size-selected providing a robust population of expressed structural genes for subsequent genome annotation. We demonstrate PacBio mRNA sequences based genome annotation improvement, compared to genome annotation using conventional sequencing-by-synthesis alone, by identifying 1609 (9.2%) new genes, extended the length of 3965 (26.7%) genes and increased the total genomic exon length by 1.9 Mb (12.4%). Non-coding sequence representation (primarily from UTRs based on dT reverse transcription priming) was particularly improved, increasing in total length by fifteen-fold, by increasing both the length and number of UTR exons. In addition, the UTR data provided by these CCS allowed for the identification of a novel SL2 splice leader sequence for A. ceylanicum and an increase in the number and proportion of functionally annotated genes. RNA-seq data also confirmed some of the newly annotated genes and gene features.Overall, PacBio data has supported a significant improvement in gene annotation in this genome, and is an appealing alternative or complementary technique for genome annotation to the other transcript sequencing technologies.


September 22, 2019

Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study.

High-throughput RNA sequencing (RNA-seq) greatly expands the potential for genomics discoveries, but the wide variety of platforms, protocols and performance capabilitites has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We carried out replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (poly-A-selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies PGM and Proton, Pacific Biosciences RS and Roche 454). The results show high intraplatform (Spearman rank R > 0.86) and inter-platform (R > 0.83) concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. For intact RNA, gene expression profiles from rRNA-depletion and poly-A enrichment are similar. In addition, rRNA depletion enables effective analysis of degraded RNA samples. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.