Bioinformatics Archives - Page 8 of 267

June 1, 2021

A complete solution for full-length transcript sequencing using the PacBio Sequel II System

Long read mRNA sequencing methods such as PacBio’s Iso-Seq method offers high-throughput transcriptome profiling in prokaryotic and eukaryotic cells. By avoiding the transcript assembly problem and instead sequencing full-length cDNA, Iso-Seq has emerged as the most reliable technology for annotating isoforms and, in turn, improving proteome predictions in a wide variety of organisms. Improvements in library preparation, sequencing throughput, and bioinformatics has enabled the Iso-Seq method to be complete solution for transcript characterization. The Iso-Seq Express kit is a one-day library prep requiring 60-300 ng of total RNA. The PacBio Sequel II system produces 4-5 million full-length reads, sufficient to profile a whole human transcriptome. Finally, the SQANTI2 software is a powerful tool for categorizing the complex isoforms against reference annotations, while also incorporating orthogonal information such as CAGE peak data, public RNA-seq junction data, and ORF predictions.

June 1, 2021

Improving long-read assembly of microbial genomes and plasmids

Complete, high-quality microbial genomes are very valuable across a broad array of fields, from environmental studies, to human microbiome health, food pathogen surveillance, etc. Long-read sequencing enables accurate resolution of complex microbial genomes and is becoming the new standard. Here we report our novel Microbial Assembly pipeline to facilitate rapid, large-scale analysis of microbial genomes. We sequenced a 48-plex library with one SMRT Cell 8M on the Sequel II System, demultiplexed, then analyzed the data with Microbial Assembly.

June 1, 2021

Metagenomic analysis of type II diabetes gut microbiota using PacBio HiFi reads reveals taxonomic and functional differences

In the past decade, the human microbiome has been increasingly shown to play a major role in health. For example, imbalances in gut microbiota appear to be associated with Type II diabetes mellitus (T2DM) and cardiovascular disease. Coronary artery disease (CAD) is a major determinant of the long-term prognosis among T2DM patients, with a 2- to 4-fold increased mortality risk when present. However, the exact microbial strains or functions implicated in disease need further investigation. From a large study with 523 participants (185 healthy controls, 186 T2DM patients without CAD, and 106 T2DM patients with CAD), 3 samples from each patient group were selected for long read sequencing. Each sample was prepared and sequenced on one Sequel II System SMRT Cell, to assess whether long accurate PacBio HiFi reads could yield additional insights to those made using short reads. Each of the 9 samples was subject to metagenomic assembly and binning, taxonomic classification and functional profiling. Results from metagenomic assembly and binning show that it is possible to generate a significant number of complete MAGs (Metagenome Assembled Genomes) from each sample, with over half of the high-quality MAGs being represented by a single circular contig. We show that differences found in taxonomic and functional profiles of healthy versus diabetic patients in the small 9-sample study align with the results of the larger study, as well as with results reported in literature. For example, the abundances of beneficial short- chain fatty acid (SCFA) producers such as Phascolarctobacterium faecium and Faecalibacterium prausnitzii were decreased in T2DM gut microbiota in both studies, while the abundances of quinol and quinone biosynthesis pathways were increased as compared to healthy controls. In conclusion, metagenomic analysis of long accurate HiFi reads revealed important taxonomic and functional differences in T2DM versus healthy gut microbiota. Furthermore, metagenome assembly of long HiFi reads led to the recovery of many complete MAGs and a significant number of complete circular bacterial chromosome sequences.

June 1, 2021

Comprehensive variant detection in a human genome with highly accurate long reads

Introduction: Long-read sequencing has been applied successfully to assemble genomes and detect structural variants. However, due to high raw-read error rates (10-15%), it has remained difficult to call small variants from long reads. Recent improvements in library preparation and sequencing chemistry have increased length, accuracy, and throughput of PacBio circular consensus sequencing (CCS) reads, resulting in 15-20kb reads with average read quality above 99%. Materials and Methods: We sequenced a library from human reference sample HG002 to 18-fold coverage on the PacBio Sequel II with two SMRT Cells 8M. The CCS algorithm was used to generate highly accurate (average 99.9%) 12.9kb reads, which were mapped to the hg19 reference with pbmm2. We detected small variants using Google DeepVariant with a model trained for CCS and phased the variants using WhatsHap. Structural variants were detected with pbsv. Variant calls were evaluated against Genome in a Bottle (GIAB) benchmarks. Results: With these reads, DeepVariant achieves SNP and Indel F1 scores of 99.70% and 96.59% against the GIAB truth set, and pbsv achieves 97.72% recall on structural variants longer than 50bp. Using WhatsHap, small variants were phased into haplotype blocks with 145kb N50. The improved mappability of long reads allows us to align to and detect variants in medically relevant genes such as CYP2D6 and PMS2 that have proven “difficult-to-map” with short reads. Conclusions: These highly accurate long reads combine the mappability and ability to detect structural variants of long reads with the accuracy and ability to detect small variants of short reads.

June 1, 2021

A workflow for the comprehensive detection and prioritization of variants in human genomes with PacBio HiFi reads

PacBio HiFi reads (minimum 99% accuracy, 15-25 kb read length) have emerged as a powerful data type for comprehensive variant detection in human genomes. The HiFi read length extends confident mapping and variant calling to repetitive regions of the genome that are not accessible with short reads. Read length also improves detection of structural variants (SVs), with recall exceeding that of short reads by over 30%. High read quality allows for accurate single nucleotide variant and small indel detection, with precision and recall matching that of short reads. While many tools have been developed to take advantage of these qualities of HiFi reads, there is no end-to-end workflow for the filtering and prioritization of variants uniquely detected with long reads for rare and undiagnosed disease research. We have developed a flexible, modular workflow and web portal for variant analysis from HiFi reads and applied it to a set of rare disease cases unsolved by short-read whole genome sequencing. We expect that broad application of long-read variant detection workflows will solve many more rare disease cases. We have made these tools available at https://github.com/williamrowell/pbRUGD-workflow, and we hope they serve a starting point for developing a robust analysis framework for long read variant detection for rare diseases.

February 5, 2021

Podcast: Going beyond the $1,000 genome with Mark Gerstein

Mark Gerstein is the co-director of the Yale Computational Biology and Bioinformatics program where he focuses on better annotation of the human genome and better ways to mine big genomics…

February 5, 2021

Podcast: Long-read sequencing dramatically improves blood matching – Steven Marsh

One of the popular questions on the Mendelspod program is how those doing sequencing decide between the quality of PacBio’s long reads and the cheaper short read technology, such as…

February 5, 2021

Podcast: Frontiers of sequencing – Putting long reads and graph assemblies to work

The Mike Schatz lab at Cold Spring Harbor is well know for de novo genome assemblies and their work on structural variation in cancer genomes. In this Mendelspod podcast, lab…

February 5, 2021

DNAnexus Webinar: Simplifying de novo assembly with PacBio tools available on DNAnexus: FALCON

Brett Hannigan, Computational Biology Project Leader at DNAnexus, demonstrates a fast, accurate, and cost-efficient solution for diploid-aware de novo genome assembly utilizing FALCON on the DNAnexus platform.

February 5, 2021

DNAnexus Webinar: Accurate calling of structural variation in PacBio data

Andrew Carroll, Director of Science at DNAnexus, presents how to greatly improve the accuracy of SV-calling by using long-read PacBio sequencing and fast and easy-to-run cloud-optimized apps like PBHoney, Parliament,…

February 5, 2021

Webinar: Analysis and visualization tools for long reads, assemblies and complex variation

This presentation describes a new genome browser for read alignments around complex variation: genomeribbon.com. Ribbon was built for viewing genomic read alignments around structural variants. It is very useful for…

February 5, 2021

PAG PacBio Workshop: Using the Iso-Seq method to fill in your a_no__tion

Richard Kuo from the Roslin Institute gave this PAG 2017 talk about using the PacBio Iso-Seq data to generate genome annotations that outperform current gold-standard annotations. Included: findings from a…

February 5, 2021

ASHG PacBio Workshop: Identification and characterization of informative genetic structural variants for neurodegenerative diseases

Michael Lutz, from the Duke University Medical Center, discussed a recently published software tool that can now be used in a pipeline with SMRT Sequencing data to find structural variant…

February 5, 2021

ASHG PacBio Workshop: A future of high-quality genomes, transcriptomes, and epigenomes

Jonas Korlach spoke about recent SMRT Sequencing updates, such as latest Sequel System chemistry release (1.2.1) and updates to the Integrative Genomics Viewer that’s now update optimized for PacBio data….

February 5, 2021

AGBT Conference: Personalized phased diploid genomes of the EN-TEx samples

At AGBT 2017, Mike Schatz from Johns Hopkins University and Cold Spring Harbor Laboratory presented data from sequencing, assembling, and analyzing personalized, phased diploid genomes with either Illumina, 10x Genomics,…

Auto Tag: Bioinformatics

A complete solution for full-length transcript sequencing using the PacBio Sequel II System

Improving long-read assembly of microbial genomes and plasmids

Metagenomic analysis of type II diabetes gut microbiota using PacBio HiFi reads reveals taxonomic and functional differences

Comprehensive variant detection in a human genome with highly accurate long reads

A workflow for the comprehensive detection and prioritization of variants in human genomes with PacBio HiFi reads

Podcast: Going beyond the $1,000 genome with Mark Gerstein

Podcast: Long-read sequencing dramatically improves blood matching – Steven Marsh

Podcast: Frontiers of sequencing – Putting long reads and graph assemblies to work

DNAnexus Webinar: Simplifying de novo assembly with PacBio tools available on DNAnexus: FALCON

DNAnexus Webinar: Accurate calling of structural variation in PacBio data

Webinar: Analysis and visualization tools for long reads, assemblies and complex variation

PAG PacBio Workshop: Using the Iso-Seq method to fill in your a_no__tion

ASHG PacBio Workshop: Identification and characterization of informative genetic structural variants for neurodegenerative diseases

ASHG PacBio Workshop: A future of high-quality genomes, transcriptomes, and epigenomes

AGBT Conference: Personalized phased diploid genomes of the EN-TEx samples

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert