While advances in RNA sequencing methods have accelerated our understanding of the human transcriptome, isoform discovery remains a challenge because short read lengths require complicated assembly algorithms to infer the contiguity of full-length transcripts. With PacBio’s long reads, one can now sequence full-length transcript isoforms up to 10 kb. The PacBio Iso- Seq protocol produces reads that originate from independent observations of single molecules, meaning no assembly is needed. Here, we sequenced the transcriptome of the human MCF-7 breast cancer cell line using the Clontech SMARTer® cDNA preparation kit and the PacBio RS II. Using PacBio Iso-Seq bioinformatics software, we…
2015 SMRT Informatics Developers Conference Presentation Slides: Gene Myers, Ph.D., Founding Director, Systems Biology Center, Max Planck Institute delivered the keynote presentation. He talked about building efficient assemblers, the importance of random error distribution in sequencing data, and resolving tricky repeats with very long reads. He also encouraged developers to release assembly modules openly, and noted that data should be straightforward to parse since sharing data interfaces is easier than sharing software interfaces.
Although the accuracy of the human reference genome is critical for basic and clinical research, structural variants (SVs) have been difficult to assess because data capable of resolving them have been limited. To address potential bias, we sequenced a diversity panel of nine human genomes to high depth using long-read, single-molecule, real-time sequencing data. Systematically identifying and merging SVs =50 bp in length for these nine and one public genome yielded 83,909 sequence-resolved insertions, deletions, and inversions. Among these, 2,839 (2.0 Mbp) are shared among all discovery genomes with an additional 13,349 (6.9 Mbp) present in the majority of humans,…
This systems biology animation depicts the type of connectivity that exists at multiple scales in a living system. Starting at the molecular level, interactions between DNA (red cubes), RNA (blue cubes), proteins (green cubes), and metabolites (yellow cubes) define the core biological processes required for higher order function. Core biological processes are defined by networks of interactions, and these networks in turn can be interacting with each other as well, either within a given cell, between cells in a given tissue, or between organs in a complex organism. By organizing the vast array of molecular phenotypes into networks that define…
This documentary film features the wave of cutting-edge technologies that now provide the opportunity to create predictive models of living systems, and gain wisdom about the fundamental nature of life itself. The potential impact for humanity is immense: from fighting complex diseases such as cancer, enabling proactive surveillance of virulent pathogens, and increasing food crop production.
Part I of The New Biology documentary. This documentary film features the wave of cutting-edge technologies that now provide the opportunity to create predictive models of living systems, and gain wisdom about the fundamental nature of life itself. The potential impact for humanity is immense: from fighting complex diseases such as cancer, enabling proactive surveillance of virulent pathogens, and increasing food crop production.
In this PacBio User Group Meeting presentation, PacBio scientist Kristin Mars speaks about recent updates, such as the single-day library prep that’s now possible with the Iso-Seq Express workflow. She also notes that one SMRT Cell 8M is sufficient for most Iso-Seq experiments for whole transcriptome sequencing at an affordable price.
Domestication of clonally propagated crops such as pineapple from South America was hypothesized to be a ‘one-step operation’. We sequenced the genome of Ananas comosus var. bracteatus CB5 and assembled 513?Mb into 25 chromosomes with 29,412 genes. Comparison of the genomes of CB5, F153 and MD2 elucidated the genomic basis of fiber production, color formation, sugar accumulation and fruit maturation. We also resequenced 89 Ananas genomes. Cultivars ‘Smooth Cayenne’ and ‘Queen’ exhibited ancient and recent admixture, while ‘Singapore Spanish’ supported a one-step operation of domestication. We identified 25 selective sweeps, including a strong sweep containing a pair of tandemly duplicated…
Chlorella vulgaris is a fast-growing fresh-water microalga cultivated at the industrial scale for applications ranging from food to biofuel production. To advance our understanding of its biology and to establish genetics tools for biotechnological manipulation, we sequenced the nuclear and organelle genomes of Chlorella vulgaris 211/11P by combining next generation sequencing and optical mapping of isolated DNA molecules. This hybrid approach allowed to assemble the nuclear genome in 14 pseudo-molecules with an N50 of 2.8 Mb and 98.9% of scaffolded genome. The integration of RNA-seq data obtained at two different irradiances of growth (high light-HL versus low light -LL) enabled…
Over the past decade, RNA sequencing (RNA-seq) has become an indispensable tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs. However, as next-generation sequencing technologies have developed, so too has RNA-seq. Now, RNA-seq methods are available for studying many different aspects of RNA biology, including single-cell gene expression, translation (the translatome) and RNA structure (the structurome). Exciting new applications are being explored, such as spatial transcriptomics (spatialomics). Together with new long-read and direct RNA-seq technologies and better computational tools for data analysis, innovations in RNA-seq are contributing to a fuller understanding of RNA biology, from questions…
Fusarium wilt of banana is caused by the soil-borne fungal pathogen Fusarium oxysporum f. sp. cubense (Foc). We generated two chromosome-level assemblies of Foc race 1 and tropical race 4 strains using single-molecule real-time sequencing. The Foc1 and FocTR4 assemblies had 35 and 29 contigs with contig N50 lengths of 2.08 Mb and 4.28 Mb, respectively. These two new references genomes represent a greater than 100-fold improvement over the contig N50 statistics of the previous short read-based Foc assemblies. The two high-quality assemblies reported here will be a valuable resource for the comparative analysis of Foc races at the pathogenic…
Pepper is an important vegetable with great economic value and unique biological features. In the past few years, significant development has been made towards understanding the huge complex pepper genome; however, pepper functional genomics has not been well studied. To better understand the pepper gene structure and pepper gene regulation, we conducted full-length mRNA sequencing by PacBio sequencing and obtained 57862 high-quality full-length mRNA sequences derived from 18362 previously annotated and 5769 newly detected genes. New gene models were built that combined the full-length mRNA sequences and corrected approximately 500 fragmented gene models from previous annotations. Based on the full-length…
Supernumerary B chromosomes (Bs) are extra karyotype units in addition to A chromosomes, and are found in some fungi and thousands of animals and plant species. Bs are uniquely characterized due to their non-Mendelian inheritance, and represent one of the best examples of genomic conflict. Over the last decades, their genetic composition, function and evolution have remained an unresolved query, although a few successful attempts have been made to address these phenomena. A classical concept based on cytogenetics and genetics is that Bs are selfish and abundant with DNA repeats and transposons, and in most cases, they do not carry…
Trichoplusiani derived cell lines are commonly used to enable recombinant protein expression via baculovirus infection to generate materials approved for clinical use and in clinical trials. In order to develop systems biology and genome engineering tools to improve protein expression in this host, we performed de novo genome assembly of the Trichoplusiani-derived cell line Tni-FNL.By integration of PacBio single-molecule sequencing, Bionano optical mapping, and 10X Genomics linked-reads data, we have produced a draft genome assembly of Tni-FNL.Our assembly contains 280 scaffolds, with a N50 scaffold size of 2.3 Mb and a total length of 359 Mb. Annotation of the Tni-FNL…
Epstein-Barr virus (EBV) is a ubiquitous human pathogen associated with Burkitt’s lymphoma and nasopharyngeal carcinoma. Although the EBV genome harbors more than a hundred genes, a full transcription map with EBV polyadenylation profiles remains unknown. To elucidate the 3′ ends of all EBV transcripts genome-wide, we performed the first comprehensive analysis of viral polyadenylation sites (pA sites) using our previously reported polyadenylation sequencing (PA-seq) technology. We identified that EBV utilizes a total of 62?pA sites in JSC-1, 60 in Raji, and 53 in Akata cells for the expression of EBV genes from both plus and minus DNA strands; 42 of…