This blog features voices from PacBio — and our partners and colleagues — discussing the latest research, publications, and updates about SMRT Sequencing. Check back regularly or sign up to have our blog posts delivered directly to your inbox.
Search PacBio’s Blog
As technology developers, one of our greatest joys is seeing how customers take our sequencing tools and deploy them for innovative and compelling new projects. Metagenomics has been one of those areas: our customers have recently been demonstrating the significant performance improvements enabled by our HiFi metagenome sequencing data and analysis pipelines.
But since much of that work is protected by HIPAA regulations or has not yet been published, we are now releasing a metagenomic data set to help scientists see how HiFi data can make a difference for these types of studies. This information is now available for review and analysis and can be used with existing tools or to help develop new ones.
The data set was generated from four fully consented, pooled human fecal microbiome samples made available through The BioCollective. Two samples came from vegan donors and two from omnivore donors, allowing us to see how diet influences gut microbiota. The pooling process, which creates a reference material by pooling samples from multiple donors (in this case four adults), leads to a more complex sample and a richer data set than can be obtained through mock community approaches. It also gives a more consistent composition than samples from an individual donation.
Long-Read Sequencing Produces Rich Profiling Information
HiFi sequencing gave us nearly 2 million reads per sample, with mean read length close to 10 kb for each. Median quality for the sequencing data was Q39 for two samples and Q40 for two samples. We found that species composition was consistent within diets and different between diets. Of the 76 bacterial species detected, 14 were exclusive to the omnivore samples and 21 were only found in the vegan samples.
There are a lot of exciting things to unpack in this data set. First, it demonstrates that our data analysis pipelines produce rich functional profiling information. Unlike analyses of short-read data, about 90% of HiFi reads have at least one functional annotation, with reads typically having two to five annotations. For each sample run on a single SMRT Cell 8M, we generated more than 8 million total annotations.
In addition, the data set highlights the advantage of high accuracy when assembling long-read data from metagenomes. These samples often contain closely related strains. A common cutoff for defining a distinct species is just 3%; if the difference between strains is less than the error rate, then the error correction process can erase the real differences needed to resolve and distinguish those strains.
This heightened ability to resolve strains is what drives the large number of high-quality metagenome-assembled genomes (MAGs) that can be recovered from a relatively small amount of HiFi data. For each sample, our assembly evaluation pipeline identified at least 56 — and as many as 69 — MAGs. The unique combination of high accuracy and long reads means that high-quality MAGs can be generated with less than 20-fold coverage, and many of those MAGs are represented in a single contig.
Listen to Daniel Portik talk about this new dataset in the first episode of our Metagenomics Webinar Series on demand here. We hope you get the chance to download the data and experience it for yourself.
Want to talk to us about this data set or have project ideas where you think HiFi data can make a difference? Hit us up on Twitter or reach out directly to our metagenomic specialists Meredith Ashby or Daniel Portik.
If you are interested in additional Metagenomics Webinars, register for upcoming episodes to learn about:
● How to resolve viral evolution and quasispecies diversity mechanisms of bacterial virulence and adaptation,
● Identifying key players in host-microbiome interactions with high resolution 16S sequencing, and
● Revealing mechanisms of bacterial virulence and adaptation
Here at PacBio, we have had the privilege of awarding many SMRT Grants to intrepid scientists who believe that HiFi sequencing data can help them achieve their goals. Recently, we invited people to apply for our Clinical Research SMRT Grant for projects with a link to potential clinical utility. We believe these projects could benefit tremendously from the value of HiFi reads, which offer both high accuracy and long reads to reveal genomic insights often missed by short-read sequencing.
Narrowing these applications down to just one winner is always challenging, but this time we found it to be impossible. So, for the first time ever, we planned to give one award and wound up making two awards instead. We are thrilled to announce the winners— one scientist at the start of her career and one well established in hers. We couldn’t be prouder to support the work of these two outstanding women and the questions they seek to answer.
Please join us in congratulating Danielle Brandes and Jenny Taylor on becoming our latest SMRT Grant winners! Here’s a look at what they plan to do with their awards.
Danielle Brandes, PhD Student
Institution: Pediatric Oncology, Medical Faculty, Heinrich Heine University Düsseldorf
Project Goal: Discover structural variants related to pediatric acute lymphoblastic leukemia that have been missed by other technologies.
Danielle’s proposal piqued our interest for many reasons. Acute lymphoblastic leukemia (ALL) is the most common childhood cancer to-date. Scientists understand that cancer predisposition genes (CPGs) are part of the puzzle when it comes to genetic predisposition in leukemic patients. However, CPGs are just a piece of the story. A large portion of the genome is affected by structural variants. Unfortunately, when it comes to leukemia, little is known about how structural variations play a part.
“Despite technical and analytical progress in the field of NGS, the landscape of structural variations remains largely unresolved. In this context, we are excited to see how PacBio HiFi long-read sequencing will complement our whole-genome optical mapping data set to elucidate potentially pathogenic SVs in our studies of acute lymphoblastic leukemia. This approach will give new insights on mechanisms of leukemic predisposition as well as to the spectrum of somatic structural variation in leukemia.”
— Danielle Brandes
Danielle’s team has performed whole-genome optical mapping (WGOM) to identify SVs in pediatric patient studies diseased with a high hyperdiploid or ETV6-RUNX1 translocated ALL. But, there is more to be done. Through HiFi sequencing, Danielle hopes to detect additional SVs that might have been missed by, or could complement, the WGOM data she has been gathering.
With the help of the SMRT Grant, we are excited to see how Danielle will be able to use HiFi sequencing to generate an individual comprehensive germline/leukemia genome in the pursuit of pathogenic SVs in CPGs and somatically acquired events in ALL studies.
Jenny Taylor, Associate Professor
Institution: University of Oxford
Project Goal: Use HiFi sequencing to resolve structural variants and phase variants for a few participants in the UK’s 100,000 Genomes Project as a demonstration of how this approach could potentially help address unsolved disease cases.
Our second winner, Jenny Taylor, is a seasoned scientist with years of experience in her field. Now, she is hoping to use PacBio’s technology to add value to specific samples collected as part of the 100,000 Genomes Project to further understand rare diseases.
“I am delighted to be awarded this grant from PacBio that may help our lab support Genomics England to increase understanding of undiagnosed diseases for some of those who have been referred to the 100,000 Genomes Project.”
— Jenny Taylor
The UK’s 100,000 Genomes Project completed whole genome sequencing for 73,880 genomes from rare disease patients. Jenny is hopeful that further research can be done to investigate the pathogenesis of some of the unsolved rare disease research cases to which they have access. With HiFi sequencing, she hopes to undertake comprehensive variant detection in 3 genomes to provide proof-of-principle for the PacBio platform.
Congratulations to these two outstanding scientists! We couldn’t be more excited to see what comes of these projects and are honored to sponsor each of these scientists in their pursuit of discovery.
And thank you to our co-sponsor, Icahn Institute for Data Science and Genomic Technology, for teaming up with PacBio to make this SMRT Grant possible. Explore upcoming SMRT Grant Programs to apply to have your project funded.
A new paper from scientists at the Max Planck Institute offers a great look at how HiFi sequencing delivers significantly improved results for metagenome studies compared to short-read data. In this project, HiFi reads led to higher-quality assemblies with less coverage and gave more insight into these complex microbial communities.
In the PeerJ publication, lead author Taylor Priest (@taylorpriest2), senior author Rudolf Amann, and collaborators report the analysis of 11 seawater samples collected from the Fram Strait, which connects the Arctic and Atlantic oceans and offers a unique view of how climate change is affecting marine ecosystems.
Long-Read Sequencing in Marine Ecosystems Impacted By Climate Change
They performed metagenome sequencing of all samples with short-read technology and analyzed three of them with HiFi reads using the ultra-low library protocol on the Sequel II System. For the PacBio sequencing, all three samples were pooled and sequenced together on a single SMRT Cell, leading to 4-6 Gb of HiFi data per sample.
The PacBio data set yielded 128 metagenome assembled genomes (MAGs) from about 15 Gb of total data collected. In contrast, the short-read data set was about 10 times that size, but produced just 218 MAGs, or fewer than twice as many.
“Of the species-representative MAGs recovered, those generated from the PacBio metagenomes had, on average, larger genome sizes, higher N50 values, and were less fragmented compared to those retrieved from Illumina metagenomes,” Priest et al. report.
The quality of the assemblies also allowed the researchers to simplify their metagenome assembly pipeline. Importantly, the authors note, “taxonomic reassembly was not performed for the PacBio dataset due to the high quality of generated MAGs from single metagenome assemblies.”
HiFi assemblies also showed strength for community composition analysis. In metagenomics, researchers often pull out the 16S rRNA gene sequences to identify the microbial members of a community. Unfortunately, short-read data is not well suited to this task.
“In this study, 84% of MAGs retrieved from the PacBio metagenomes contained at least one complete 16S rRNA gene sequence, highlighting another key advantage of using long Hifi reads.”
This advantage also extended to unassembled data. “A major restriction with short-read metagenomic sequencing is the limited capacity to accurately reassemble full length 16S rRNA genes,” the team notes. “With the advent of highly accurate long read sequences generated from PacBio sequel II (>99% accuracy), full length 16S rRNA genes can be retrieved from single reads without a need for assembly, thus circumventing previous limitations.”
HiFi Reads for Marine Metagenomes Reveals Phylogenetic Diversity
An analysis of these results gave the scientists an interesting look at microbial populations in an area where warming sea water is already having an impact. The recovered diversity encompassed 9 phyla, 11 classes, 27 orders, ∼51 families and ∼54 genera. The most species-rich taxa were the Flavobacteriales (41 species), Pseudomonadales (18 species) and Rhodobacterales (17 species).
This paper describes the team’s first use of HiFi sequencing for marine metagenomes, but it likely won’t be the last.
“We can conclude that HiFi read metagenomes derived from the PacBio Sequel II platform can greatly improve the number and quality of MAGs recovered, which will allow for further advancement in our understanding of the ecology of marine microbial communities,” they report.
Want to learn more about how HiFi Reads allow researchers to see metagenomes in high resolution? Visit our Microbial Applications page.
Have you heard about our Metagenomics Webinar Series?
Register now for the first episode in our series, highlighting what richer data and better assemblies reveal about metagenome structure and function. Or, stay tuned for the additional webinars in the series showcasing:
● How to resolve viral evolution and quasispecies diversity mechanisms of bacterial virulence and adaptation,
● Identifying key players in host-microbiome interactions with high resolution 16S sequencing, and
● Revealing mechanisms of bacterial virulence and adaptation
Hibernating bears have heart rates of 10-15 beats per minute, yet they do not develop congestive heart failure. Despite accumulating enormous amounts of fat and acquiring insulin resistance, they do not suffer metabolic diseases. And they maintain muscle strength in the near absence of weight-bearing activity.
If we could crack these feats of physiology, perhaps we could apply the knowledge towards therapeutic targets for the prevention and treatment of numerous human diseases.
The Project that Shed Light on the Metabolic Mystery of Brown Bears
Washington State University researchers have come several steps closer to characterizing the hibernation phenotype by analyzing differential gene expression and tissue-specific isoform changes between active, hyperphagic, and mid-hibernation physiological states in the brown bear (Ursus arctos).
Led by the lab of Joanna L. Kelley (@joannalkelley), the team first identified more than 10,000 genes differentially regulated in adipose (fat), liver and muscle tissues between active and hibernating states
The project, which was supported by a PacBio SMRT Grant, involved sequencing and analysis of tissues from three bears using the Iso-Seq method for full length RNA transcripts. The Sequencing & Genotyping Center at the University of Delaware sponsored the grant and processed the RNA with the help of the center’s director, Bruce Kingham (@bkingham).
“Single Molecule, Real-Time Sequencing Iso-Seq is ideal for identifying the full-length isoforms that are differentially expressed between seasons,” the authors wrote.
By combining the Iso-Seq data across samples and replicates, they obtained a total of 6.1 million full-length HiFi reads. After running the long reads through analysis, mapping to the reference genome, and filtering for library artifacts, they obtained 76,071 unique, full-length isoforms ranging from 150 bp – 16.5 kilobases (kb).
They merged these isoforms with the existing reference transcriptome (which contained 30,263 genes encompassing 58,335 transcripts) and found a total of 31,829 genes encompassing 107,649 transcripts, thus greatly increasing the number of known transcripts.
“Importantly, this merging of the reference transcriptome with the full-length transcriptome originating from samples of interest improves the reference and could lead to the discovery of differential isoform usage (DIU) that would otherwise be missed,” the authors note.
Tying Hibernation Biology to Human Health
Analysis of the data showed that metabolically active tissues vary dramatically in their isoform usage and underscored the complexity and importance of adipose as a dynamic tissue during hibernation. It demonstrated that both transcription and RNA processing play concerted roles.
“While differentially expressed genes have shed light on hibernation biology, determining genes where functionally distinct isoforms change between seasons is the next essential biology to uncover,” the authors wrote.
“Our study provides an unprecedented view into hibernation biology through the lens of RNA processing by producing a dataset that improved the annotation of the brown bear genome and reinforced the important role adipose plays in hibernation.”
Researchers have suspected that aspects of hibernation physiology might be applicable to solving certain types of human disease. Until now, however, little has been known about the role of differential isoform expression between a bear’s hibernation and active states.
SMRT Sequencing has allowed researchers at WSU to generate the most comprehensive analysis of isoform usage. This opens doors for further research into how things like seasonal insulin resistance and sensitivity, obesity, and urine production in bears during hibernation and activity can help inform targets for disease solutions in humans.
See what happened when PacBio scientist (and now bear enthusiast) Michelle Vierra (@the_mvierra) joined the WSU team as they collected samples for the project from bears at the WSU Bear Center. She wrote an account of the experience on Medium, and filmed her meeting with Willow the bear in the video below.
For more information about the SMRT Grant and how you can apply – go here.
If you’re interested in how to identify novel gene isoforms or have questions about Iso-Seq analysis, visit our RNA Sequencing page.
Been itching to talk about your latest single-cell experiments, your favorite differentially expressed isoforms, or your latest and greatest software for visualizing alternative splicing, but thwarted by a worldwide pandemic preventing in-person scientific events?
We were too, so we organized a virtual social club to easily enable scientists to geek out together. And we weren’t disappointed by our first event, which attracted dozens of self-proclaimed Iso-Seq analysis geeks and other curious researchers to share their work (published, unpublished and in progress) and discuss the benefits and challenges of incorporating long-read transcript sequencing into their research.
Welcome to the Iso-Seq Analysis Universe
PacBio’s own Iso-Seq analysis expert, Elizabeth Tseng (@magdoll) kicked off the Iso-Seq Social Club with an introduction to the method, which uses PacBio’s HiFi reads to characterize full-length transcript isoforms. The Iso-Seq method has been used to identify aberrant splicing in genetic diseases, characterize alternative promoter usage in cancer, and is making its way into the single-cell space for studying subregions in postnatal mouse brains and even ant brains!
But none of these studies are possible without proper tools, and as attendees learned, bioinformatics tools made specifically for long-read transcriptome data is a bustling field.
Francisco Pardo-Palacios (@FJPardoPalacios) and Ángeles Arzalluz Luque (@aarzalluz_), both from the Ana Conesa lab at Universitat Politècnica de València, presented the trilogy of SQANTI, IsoAnnot, and tappAS, which takes the output from the PacBio Iso-Seq analysis through classification, functional annotation, and differential analysis. Many of these tools are now becoming the standard workflow for Iso-Seq studies.
Fairlie Reese (@FairlieReese), a PhD candidate from UC Irvine, presented her tool, Swan. It provides a graphical representation of alternative splicing events, but can also be used to detect differential isoform usage and isoform switching events.
The Hunt For Differentially Expressed Isoforms In Bears… and Brains
Using Iso-Seq data on brown bears during hibernation and active seasons, Joanna Kelley (@joannalkelley) associate professor at Washington State University, was able to discover that fat tissue had higher levels of differential isoform usage (DIU) compared to liver and muscle tissues.
“Genes that show no change in expression levels but show major isoform switching and differential isoform usage are the ones we’re most interested in, because those are isoforms that we can’t quantify in any other way,” Kelley said.
Jack Humphrey (@JackHumphrey_), a postdoc in the Towfique Raj lab at Mount Sinai, is using Iso-Seq analysis to study complex splicing in genes associated with Alzheimer’s disease risk. Humphrey shared data from 30 post-mortem isolated microglia they collected. He also presented the processing pipelines for annotating and classifying the Iso-Seq transcripts, with an emphasis on filtering potential library artifacts – an often neglected but critical aspect of any bioinformatics work. Using a combination of existing tools and custom filtering, Humphrey showed that the curated transcriptome is high-quality and has already revealed interesting splicing events not observed with short-read data.
Single-Cell Iso-Seq Method for Precision Oncology and Hematopoietic Lineages
Arthur Dondi (@ArthurDondi), a PhD candidate from ETH Zurich, is using single-cell Iso-Seq (scIso-Seq) to study ovarian cancer. Specifically, by characterizing full-length isoforms in the omentum (fatty tissue covering the abdomen), there’s a potential for discovering neoepitopes and therapeutic targets.
Dondi and collaborators employed the HIT-scIso-Seq technique, which employs TSO artifact removal and concatenation for cDNA molecules coming out of the 10X single-cell platform, and increased the number of reads per SMRT Cell 8M by six-fold. They are planning to query this rich dataset for differential isoform expression, novel isoforms and fusion discovery.
Vladimir Souza from University of Zurich is working on calling variants from Iso-Seq data, showing that using DeepVariant or GATK with specific parameters achieved the highest precision-recall. The goal of his project is to eventually link the variations to changes in ORF predictions.
Anita Scoones (@AnitaScoonesPGR), a PhD candidate from the Earlham Institute, is studying lineage bias during hematopoietic stem cell differentiation. She wants to use single-cell Iso-Seq analysis on their plate-based single-cell libraries, similar to how her lab mate Laura Mincarelli had used long reads to look at isoform differences in aging mice.
Anne Deslattes Mays (@adeslat) and Marcel Schmidt of Georgetown University had previously used bulk Iso-Seq analysis to show that lineage-negative cells in bone marrow have higher isoform complexity than lineage-positive cells. They are now pushing the question into the single-cell space: is isoform diversity uniform at the single-cell in lineage-negative cells? Applying the scIso-Seq method, they found striking differences between the total and lineage-negative bone marrow subpopulations, where lineage-negative cells had an overwhelmingly high number of novel isoforms and were enriched in spliceosome-associated genes. This suggests that alternative splicing in lineage-negative cells is attributed to cell-fate decisions of each cell subpopulation.
What’s Next For Iso-Seq Analysis?
The event ended with a lively discussion in which attendees discussed the need for bioinformatics tools that can handle large amounts of Iso-Seq data and create reproducible workflows that others can easily adapt. They also addressed the one-size-fits-all approach of using a single reference annotation and said a re-think may be in order.
“Maybe references should be qualified by the tissues or cell types of interest,” suggested Ana Conesa (@anaconesa). “How do we use all these novel isoforms to annotate the transcriptome?”
Mays agreed that “the best reference is self.”
In neuroscience, scientists have a poor idea of what makes a cell type-specific isoform, Humphrey said. The challenge is agreeing on what a definitive reference for each cell type would be, he added.
“We’re not done at just references,” Schmidt suggested. “We need to assign a function to these isoforms, even if it’s a regulatory one.” And Conesa said a system level of analysis is necessary.
Overall, the enthusiasm around Iso-Seq analysis is consistent. The promise of a properly defined transcriptome summarized the conversation and paves the way for future discussion.
Want to learn more? Register to watch an on-demand recording of the event, or check out these resources:
PacBio Applications and Workflows
RNA Sequencing with Iso-Seq Analysis
Procedure & Checklist
The new kid on the PacBio block — The Sequel IIe System — has been receiving high marks from universities and sequencing centers around the world.
What’s it like using the instrument, which was introduced in October 2020? Several users have spoken about their experiences in a series of recent online events.
Launching PacBio Sequencing Services in a New Lab
Melissa L. Smith (@SmithLab_UofL), spoke about her experience transferring her lab from New York City to the “PacBio naive” Bluegrass State in the Unleashing the Power of HiFi webinar.
Smith admitted she faced some initial challenges in establishing her lab. Chief among them were compute capacity, data storage, ancillary equipment and staff expertise. Luckily, she was able to leverage existing campus resources to overcome many of those hurdles.
As for the computing needs, “The Sequel IIe changed everything,” she said.
Her favorite feature? On-instrument data processing, which has solved many of her compute capacity and data storage challenges. Plus, it has eliminated the need to queue or compete for compute resources with others across campus.
“The data coming off it is already collapsed, error corrected, and just 50-100 Gb, compared to ~1 TB from Sequel II,” she said.
The Sequel IIe System is not only supporting her research into immunology and infectious disease, it’s also part of a sequencing core lab, and one of PacBio’s newest Certified Service Providers. In addition to the standard sequencing pipelines, the lab will be doing assay development, SARS-CoV-2 sequencing with the new HiFiViral protocol, and other customized sequencing solutions.
Powering a Wide Range of Sequencing Applications
At the SciLifeLab in Uppsala, Sweden, the Sequel Systems are used for a whole spectrum of applications, from de novo genome assembly to BACs, YACs and filling gaps, Olga V. Pettersson told webinar attendees. Her team has been working with PacBio sequencing since 2013, initially with the PacBio RS II, and they recently upgraded to a Sequel IIe System.
In 2020, they sequenced more than 200 non-model eukaryotic genomes (around 700 individuals total), with many reaching the high quality standards of the Earth BioGenome Project.
Pettersson is also a fan of the Sequel IIe System’s advanced computing capabilities, saying it has led to a 20-fold reduction in data storage needs. HiFi reads have also helped shed light on hard-to-access “dark” regions of the human genome, she added.
Q&A with Genomics Core Facility Directors
Pettersson also appeared on a panel of expert users in SMRT Sequencing as a Service – How to Bring Long-Read Technology to Your Core Lab.
She shared tips for sample prep, instrument handling, and business planning, as well as some of the advantages of the Sequel IIe System.
“When the DNA is sufficient, we always prefer to go with PacBio because it’s so much easier, with bioinformatics off the shelf, reads of higher quality, and no need for additional polishing,” Pettersson said.
Other panelists, including Bruce Kingham of the University of Delaware Sequencing and Genotyping Center, said the Sequel System and its HiFi reads have become “the platinum standard for long read sequencing,” with extremely high demand among their users.
“There’s really no other data type like HiFi,” added Charlotte Harris, research lab supervisor at Corteva Agriscience. “Throughput has been a huge win for us. It’s allowed us to take on these much larger and more complex projects, and really benefit our profit margins.”
Want to learn more? Attend the on-demand webinar to hear from Melissa L. Smith and Olga Pettersson how the Sequel IIe System is making it easier than ever before to get started with HiFi reads or add capacity.
Want to discuss the benefits of HiFi sequencing and the Sequel IIe System for your research? Connect with a PacBio Scientist.
Interested in becoming a service provider? Visit the Sequencing for Service Provider page.
Rice was the first crop genome ever completed almost two decades ago. However, the rice reference has never been truly complete. Even improved versions of the major food staple and breeding model system Oryza sativa have contained gaps and missing sequences.
An international team of scientists from China, the United States and Saudi Arabia, has finally closed those gaps to produce two gap-free reference genome sequences of the elite O. sativa xian/indica rice varieties Zhenshan 97 (ZS97) and Minghui 63 (MH63).
How Long-Read Sequencing Fills the Gaps
As reported in Molecular Plant, Jianwei Zhang (Huazhong Agricultural University, Wuhan), Jesse Poland (Kansas State University) Rod Wing (Arizona Genomics Institute and KAUST) et al, were able to drill down to centromere level, discovering more than 395 non-TE genes located in centromere regions, of which ~41% are actively transcribed.
Previous references released in 2016 saw 10% of the genome still unassembled/unplaced, and an update in 2018 left eight and seven gaps in the ZS97 and MH63 genomes, respectively.
“To bridge all remaining assembly gaps across each genome, we incorporated high-coverage and accurate long-read sequence data and multiple assembly strategies,” the authors wrote. These strategies included both CLR and HiFi sequencing modes.
Hi-C and Bionano maps were used to validate the quality of the assemblies, and FISH and ChIP-Seq assays were utilized to discover and characterize the location and primary structure of centromeres.
The new assemblies captured a 99.88% BUSCO score and LTR assembly index (LAI) numbers that meet the standard of gold/platinum reference genomes. In addition, more than 1,500 rRNAs were identified, compared to tens in the original assemblies.
The last closed gaps in the assemblies were all in centromere regions. Centromeric regions, while critical for fidelity and segregation of chromosomes, are largely inaccessible to breeding due to greatly reduced recombination, particularly in larger genomes, the authors noted.
“The detailed understanding of centromere architecture and gene content, therefore, affords insight into the challenge of developing favorable allele combinations in the absence of natural recombination, using hybrid complementation, gene editing, or even precisely inducing recombination,” they wrote.
With its high accuracy and repeat-spanning reads, PacBio HiFi long-read sequencing was a “great resource for the assembly of complex heterozygous regions and centromeres,” the authors stated.
What the Rice Reference Genome Means for the Future
The large 10-fold variation in the number and distribution of centromeric repeats across the different chromosomes and between the genomes gives a detailed picture of the large amount of centromeric diversity both within and among plant genomes.
The new references provide a clear picture of the primary sequence architecture of the xian/indica rice genomes that feed the world, and could help in the breeding of climate resilient varieties, the authors concluded.
“Such resources will serve to develop a fundamental and comprehensive model for the study of heterosis, and other basic and applied research, and leads the path forward to a new standard for reference genomes in plant biology,” they wrote.
Interested in reading more about long-read sequencing and HiFi reads? Check out our Plant and Animal page to learn more about how they empower insect biology, crop improvements, animal health and breeding and more.
Rare diseases are defined as diseases that affect a small number of people – fewer than 1 in 2,000 in the European Union and fewer than 200,000 total people (about 1 in 1,500) in the United States. For example, Tay-Sachs disease affects 1 in 300,000 while Cystic Fibrosis is more common and affects 1 in 10,000. Though individual rare diseases affect very few people, collectively they are common and affect over 300 million people worldwide.
Advances in Sequencing Technology for Improved Understanding of Rare Diseases
With more than 70% of rare diseases being genetic in origin, scientists around the world have deployed genomic technologies to identify their causal mechanisms. Improvement in the technologies for identifying genetic variation have increased scientists’ ability to understand rare diseases. Learn more about the evolution of DNA sequencing tools.
Karyotyping was the first technology to provide a view of the genome, revealing diseases due to chromosomal abnormalities such as Turner Syndrome (1 chromosome X instead of 2 in a female). Later, microarray provided a higher-resolution view, identifying large copy number variants, as in DiGeorge Syndrome (caused by a deletion of around 2.5 Mb on Chromosome 22). Exome or whole genome sequencing based on short-read sequencing platforms enabled even more progress by detecting single nucleotide variants (SNVs), insertions and deletions, and some larger variants.
But even whole genome sequencing with short reads finds a genetic cause in less than half of all instances of rare disease–leaving the causes of many rare diseases unknown. This in part is because even whole genome sequencing with short reads does not provide a comprehensive view of variation.
Fortunately, more recent advancements have led to the introduction of long-read sequencing, which has enabled sequencing of the whole human genome – every single base – so that all types of variants can be detected from SNV up to large structural variants (SVs). Ultimately, by detecting more variants, long-read sequencing provides a more complete picture of the genome and any abnormalities that may exist.
|In case you missed it: Reaching a Genomics Milestone – The First Complete Human Genome.|
What’s the Difference between Short-Read Sequencing and Long-Read Sequencing?
Like their names suggest, short-read sequencing looks at DNA in short snippets (100-350 base pairs) while long-read sequencing measures long fragments of DNA (tens of thousands of base pairs). Why does that matter? Well, when trying to characterize a human genome that has two copies (one maternal and one paternal), each 3.2 billion base pairs in length – having longer snippets of DNA means you:
- Need fewer snippets to make up the length of the whole genome and have no gaps where the sequence is unknown
- Can more easily map how one region of the genome is connected to another region
- Have the ability to phase or determine which copy of a gene, maternal or paternal, a mutation occurs in
As it turns out, the genetic variants underlying many of these diseases are exactly the types that short-read sequencers are least able to detect. From repeat expansions to large deletions or insertions, pathogenic variants are often large and complex structural elements that cannot be spanned by short reads of just a few hundred bases. Representing these variants accurately — and capturing all types of variants — requires much longer sequence reads that cover the entire variant in a single stretch.
HiFi Sequencing – the Key to Seeing All Variant Types Involved in Rare Disease
Unlike the data produced by short-read sequencing platforms, highly accurate long-read sequencing, known as HiFi sequencing, generates extremely long reads (>25 kb) that span even the largest structural variants. HiFi sequencing provides the most comprehensive view of variation in a genome, identifying the variation found with short reads and detecting the larger and more complex variants that short reads miss.
The long reads and high accuracy (>99.9%) of HiFi sequencing provide very complete genome assemblies, comprehensive variant detection with base-pair resolution, and phasing to represent maternal and paternal haplotypes.
Unlocking the Secrets of Rare Diseases with HiFi Sequencing
HiFi sequencing has already made a substantial difference in rare disease research by identifying variants that were missed by short-read sequencing and other technologies. For more detail, check out these research studies of undiagnosed rare diseases and the types of pathogenic variants underlying them.
Structural Variant Calling in Rare Disease Studies
One of the earliest examples of how PacBio sequencing technology could play a role in rare disease research came from the Stanford lab of cardiologist Euan Ashley (@euanashley) and a young man who had suffered a series of tumors in his heart and glands. Eight years of genetic analyses had produced no firm answers. Ashley’s team used a novel method of PacBio whole genome sequencing to find a novel structural variant in a gene associated with Carney syndrome, which was later validated as the correct mutation and finding.
More recently, a group at HudsonAlpha found new evidence in the study of a young girl with intellectual disabilities, seizures, and speech delay. With HiFi sequencing, the scientists at HudsonAlpha identified a de novo heterozygous insertion of nearly 7,000 bases in an intron of the CDKL5 gene that they deemed likely pathogenic. Since CDKL5 has been associated with early infantile epileptic encephalopathy 2, a condition characterized by many symptoms experienced by the proband, “we prioritized this event as the most interesting candidate variant,” the authors reported.
|Structural variants are generally classified as being >50 bp in length and include insertions, deletions, duplications, copy-number variants, inversions, and translocations. Learn more.|
In Japan, researchers deployed HiFi sequencing to find the cause of an undiagnosed syndrome in twin 12-year-old girls. Clinical symptoms matched Dravet syndrome, but no molecular evidence was available to confirm that finding. They sequenced one of the twins and both parents, identifying a novel 12 kb inversion in a region that had previously been associated with the same symptoms affecting the girls.
In one last structural variant example, Kristen Sund (@kristen_sund) from Cincinnati Children’s Hospital identified a 13 Mb complex rearrangement that appears to be responsible for a movement disorder in a 17-year-old with chorea, myoclonus, anxiety, and hypothyroidism. The variant was found in the NKX2-1 gene.
Small Variants in Challenging Regions of the Genome
For an individual with lissencephaly (lack of folds in brain), developmental delay, and seizures, scientists at Children’s Mercy Kansas City used HiFi sequencing to reveal a pathogenic variant in a region that proved difficult for short reads to represent accurately. HiFi sequencing provided even coverage — unlike the coverage dropout seen with short-read data for the same region — which spotted the key variant.
Capturing the Full Length and Sequence of Repeat Expansions
Repeat expansions have previously been shown to cause a range of diseases and can be tough to characterize accurately with short-read sequencing tools. HiFi sequencing can get through even very long expansions. Recently, scientists from Adelaide Medical School and the Robinson Research Institute linked the expansion of an ATTTC repeat in the first intron of STARD7 with familial adult myoclonic epilepsy.
|Repeat expansions are mutations that result in repeating sequence that may extend for hundreds to thousands of bases. For example, the trinucleotide repeat expansion that causes Huntington’s disease, consists of hundreds of CAG repeats.|
Phasing Rare Disease Variants Across Alleles
|Phasing involves separating maternally and paternally inherited copies of each chromosome into haplotypes to get a complete picture of genetic variation. Learn more.|
Back at Children’s Mercy Kansas City, researchers analyzed the genome of a four-year-old girl with hepatosplenomegaly whose parental genomes were not available. The individual was believed to have Niemann Pick disease Type C, but more data was needed to support the theory. HiFi reads showed two key variants located on different alleles of the relevant gene; with the phased variants, scientists were able to confirm the original finding.
The Future of Rare Disease Research is Bright
Scientists around the world are striving to improve the lives of those affected by rare diseases, translating the latest research approaches and high-quality genomic data into insights that could enable the development of improved diagnostics for rare diseases. As HiFi sequencing continues to shed light on more areas of the genome, it should have a profound effect on our ability to diagnose, understand and ultimately improve treatment for the rare disease community.
To learn more about how PacBio HiFi sequencing is helping advance our understanding of rare disease, watch on-demand presentations from our Virtual Rare Disease Week event or visit our rare disease resource page.
Explore Other Posts in the Sequencing 101 Series
- The Evolution of DNA Sequencing Tools
- Understanding Accuracy in DNA Sequencing
- Webinar: How Long-read Sequencing Improves Access to Genetic Information
- Introduction to PacBio Sequencing and the Sequel II System
- From DNA to Discovery – The Steps of SMRT Sequencing
- DNA Extraction – Tips, Kits, & Protocols
- Sequencing 101: Ploidy, Haplotypes, and Phasing – How to Get More from Your Sequencing Data
- Looking Beyond the Single Reference Genome to a Pangenome for Every Species
- Why Are Long Reads Important for Studying Viral Genomes?
- What’s the Value of Sequencing Full-length RNA Transcripts?
An exciting new paper from scientists at the National Institute of Allergy and Infectious Diseases and the NIH Clinical Center reports on the evolution of the SARS-CoV-2 virus within individuals. The team used HiFi sequencing to make this work possible.
The paper, which was published in PLoS Pathogens, comes from lead authors Sung Hee Ko, Elham Bayat Mokhtari, Prakriti Mudvari, senior author Eli Boritz, and collaborators. They conceived the project to overcome a key challenge in tracking viral adaptation. “An important obstacle to understanding intra-individual evolution of SARS-CoV-2 is that standard sequencing and analytical procedures yield a single consensus sequence for each sample, rather than multiple sequences representing virus quasispecies diversity,” they write.
To address the issue, they developed a new method based on HiFi sequencing to focus on the 6.1 kb region of the SARS-CoV-2 genome encoding its surface proteins. They then conducted deep sequencing of eight individuals, yielding large numbers of fully phased S, E, and M gene sequences from each person. In one individual, the availability of four samples collected over time allowed for a longitudinal analysis of viral response to host immune pressure. The scientists had previously used HiFi sequencing to study the intra-individual evolution of HIV, and believed that the same approach could be useful during the COVID-19 pandemic.
The choice of HiFi sequencing, which builds a highly accurate sequence based on consensus calls from covering the same molecule over and over, gave the team an excellent view of viral evolution. When we asked senior author Eli Boritz about his choice of technology, he shared that “By early 2020, we had been working for several years to use HiFi sequencing for high-throughput, single-copy, long-read HIV genetic analysis. Our approach in the HIV studies used unique molecular identifiers (UMIs) for error correction and drew on a short-read approach from Ron Swanstrom’s group and a PacBio approach from Jim Mullins’s group. As the pandemic took off around the world, we decided to adapt our approach to SARS-CoV-2. We didn’t know if this new virus would generate enough diversity to warrant our detailed sequence analysis, but we decided that it would be important to look.”
The longitudinal analysis yielded results highly suggestive of natural selection, revealing four viral haplotypes harboring three mutations that arose independently in a single epitope. “These mutations arose coincident with a 6.2-fold rise in serum binding to spike and a transient increase in virus burden,” the scientists note. “We conclude that SARS-CoV-2 exhibits a capacity for rapid genetic adaptation that becomes detectable in vivo with the onset of humoral immunity, with the potential to contribute to delayed virologic clearance in the acute setting.”
In the other study participants for whom repeated sampling was not possible, the team found lower genetic diversity in the viruses sequenced. They hypothesize that this is likely the result of analyzing samples collected early in the infection process rather than after the host’s immune response has had time to select variants with mutated spike proteins.
We asked Eli Boritz about what’s next for his team. For future longitudinal studies, he told us, “it will be important … to sequence additional regions of the virus and to perform a comprehensive analysis of antiviral host responses, including neutralizing antibodies, T cells, and other mechanisms.” He also hopes to analyze viral samples from more complex cases, such as reinfections. “We hope these studies can teach us about the virus’s capacity for additional waves of escape variants in the future,” he said.
The team’s insights into viral evolution in a single person have important implications for COVID-19 treatment. “Our results also emphasize that early antiviral therapy or combinations of antivirals with distinct targets could have markedly higher virologic efficacy than monotherapy administered later in the disease course,” the scientists conclude.
It’s a moment three decades in the making: the first complete human genome assembly is here!
Reading this you will no doubt feel some sense of déjà vu. After all, the human genome reference was pronounced “done” in 2000, 2001, and again in 2003. But any scientist who has used the reference since then knows that there has never been a single fully sequenced human genome. Until now.
HiFi Sequencing Enables the First Complete Sequence of a Human Genome
The Telomere-to-Telomere (T2T) Consortium, a large team of scientists from the National Human Genome Research Institute and dozens of other institutions, released a new preprint titled “The complete sequence of a human genome.” Lead authors Sergey Nurk, Sergey Koren, Arang Rhie, and Mikko Rautiainen, along with corresponding authors Evan Eichler, Karen Miga, and Adam Phillippy as well as many collaborators have now vanquished gaps and errors to deliver what they call “the first truly complete human reference genome.”
This tremendous effort incorporated several cutting-edge technologies, including HiFi sequencing from PacBio, to produce a gap-free, complete haploid human genome assembly based on a complete hydatidiform mole (CHM13). The goal was to create a novel resource with comprehensive, reliable genome data that avoids the gaps and errors that still mark the latest GRCh38 reference assembly. “The resulting T2T-CHM13 reference assembly removes a 20-year-old barrier that has hidden 8% of the genome from sequence-based analysis, including all centromeric regions and the entire short arms of five human chromosomes,” Nurk et al. report.
This new reference “includes gapless assemblies for all 22 autosomes plus Chromosome X, corrects numerous errors, and introduces nearly 200 million bp of novel sequence containing 2,226 paralogous gene copies, 115 of which are predicted to be protein coding,” the authors add. This represents “the largest improvement to the human reference genome since its initial release.”
HiFi sequencing was pivotal to this achievement. The scientists note that HiFi sequencing features “20 kbp read lengths and a median accuracy of 99.9%, which has resulted in unprecedented assembly accuracy with relatively minor adjustments to standard assembly approaches. …HiFi sequencing excels at differentiating subtly diverged repeat copies or haplotypes.”
HiFi Sequencing Removes Technological Barriers
The team had initially started with a strategy of using noisy ultralong nanopore-based reads to build an assembly backbone, which was then polished with other platforms. But they subsequently switched to accurate and long HiFi reads. “We shifted to a new strategy that leverages the combined accuracy and length of HiFi reads to enable assembly of highly repetitive centromeric satellite arrays and closely related segmental duplications,” they report. The assembly is based on a string graph built from HiFi reads and has an average consensus accuracy between Q67 and Q73, “far exceed[ing] the original Q40 definition of ‘finished’ sequence,” the authors add.
The new assembly, to which a Y chromosome sequence will be added in the near future, should be used in place of the GRCh38 reference for “all studies requiring a linear reference sequence,” the scientists suggest, noting that it is “more complete, representative, and accurate” than its predecessor and “substantially increases the number of known genes and repeats in the human genome.”
The team also notes that reanalysis of short-read public data sets such as the 1000 Genomes Project using the new reference already shows improvement compared to the GRCh38 reference, and that new phenotypic associations should be expected given the more complete reference genome.
HiFi Sequencing Powers the Next Phase of Genomic Discovery
“The complete, telomere-to-telomere assembly of a human genome marks a new era of genomics where no region of the genome is beyond reach,” the authors write.
“Highly accurate, long-read sequencing, combined with tailored algorithms, promises the de novo assembly of individual haplotypes and sequence-level resolution of complex structural variation. This will require the routine and complete de novo assembly of diploid human genomes, as planned by the Human Pangenome Reference Consortium.”
Ultimately, they anticipate that highly accurate long-read sequencing will lead to a “collection of high-quality, complete reference haplotypes [that] will transition the field away from a single linear reference and towards a reference pangenome that captures the full diversity of human genetic variation,” the team reports. “Ideally, every genome could be assembled at the quality achieved here, since the small variants recovered by short-read resequencing approaches represent only a fraction of total genomic variation.”
How to Get Started with HiFi Sequencing for Any Genome
Learn more about our whole genome sequencing application.
Have your questions about HiFi sequencing answered by a PacBio scientist.
2021 HiFi for Accuracy SMRT Grant Program – Apply between June 7-25 for your chance to win free HiFi sequencing.
May is ALS Awareness Month, and we’re hoping to help raise awareness by shining a spotlight on two deserving publications from scientists at the University of Washington and at the Mayo Clinic. In both studies, researchers used PacBio technology to sequence targeted genomic regions and to discover and characterize complex pathogenic variants associated with ALS.
ALS, short for amyotrophic lateral sclerosis and commonly known as Lou Gehrig’s disease, is a progressive disease of the nervous system that causes loss of muscle control. The mean survival time for patients with ALS is just three to five years from diagnosis, and there is no cure for the disease.
At the University of Washington, scientists studied members of a large multigenerational family in which several people developed spontaneous ALS. By using barcoded amplicons and a multiplex strategy, the team analyzed the target WDR7 gene in nearly 300 samples and identified a variable number tandem repeat (VNTR) that is expanded in individuals with ALS.
Their paper, published in the American Journal of Human Genetics, details single-base resolution for a repeat that would have been virtually impossible to spot — let alone resolve with such granularity — using any other sequencing technology. “Our detailed interrogation of this VNTR demonstrates the value of high-depth, long-read sequencing of human-specific repetitive regions that expand in the genome,” the team reports.
Meanwhile, at the Mayo Clinic, scientists used PacBio’s CRISPR/Cas9-based No-Amp method to target, capture, and sequence a repeat expansion region associated with neurological diseases. The targeted element is the most common genetic cause of ALS, but the biological mechanism of how it causes disease has not been well understood. The team used PacBio sequencing to characterize the complete region, finding that individuals with fewer repeat copies had longer survival time than those with larger expansions. Results were recently published in the journal Brain.
“Our findings demonstrate that No-Amp sequencing is a powerful tool that enables the discovery of relevant clinicopathological associations, highlighting the important role played by the cerebellar size of the expanded repeat in C9orf72-linked diseases,” the scientists write.
These discoveries could open up new paths to identify people at risk of developing ALS. Congratulations to both of these teams, as well as other PacBio users who are making a difference for families affected by ALS!
Interested in learning more about this research?
- Watch Marka van Blitterswijk from the Mayo Clinic present, ‘Applying Targeted Long-read Sequencing to Assess an Expanded Repeat in C9orf72’
- See Meredith Course from the University of Washington present, ‘The Evolution and Function of a Large Tandem Repeat Associated with ALS’.
Visit our Neuroscience Research page to learn how PacBio sequencing provides a comprehensive understanding of the genetic basis of neurological disease.
If you are like most of us at PacBio you likely learned how to extract DNA in a high school or college biology class, or maybe even in your kitchen. But as you moved on to more high stakes experiments, you may have found that extracting DNA for sequencing in your lab isn’t always as straightforward as lyse, precipitate, wash, suspend. In this introduction to DNA extraction, we will share tips, tricks, and protocols to help make your DNA isolation easier!
For optimal results to power biological discovery, sample prep is a critical step in any sequencing project. And with long-read sequencing technologies, including HiFi sequencing, you not only want DNA free from nicks and degradation, but you also want long fragments (tens of kilobases) to achieve those coveted long reads.
Long-read sequencing expert and sample wrangler Olga Pettersson (@OlgaVPettersson) of SciLifeLab at Uppsala University, advises: “Aim for getting molecules as long as you can, as pure as you can, as fresh as you can.”
So, what are the factors that go into obtaining HMW DNA for sequencing? Jennifer Balacco (@JenBalacco) of the Vertebrate Genome Lab, which aims to sequence the genomes of all living vertebrate species and therefore has a dearth of experience with DNA extraction, points to sample type, the prep and storage of samples, and individualized extraction methods as the key components of successful DNA extraction. You can follow along as she shares her experience with many sample types in the video below and then explore our additional resources and considerations.
Watch this PacBio Virtual Global Summit presentation from Jennifer Balacco of the Vertebrate Genome Lab on DNA extraction approaches to achieve error-free genomes.
DNA Extraction Challenge #1: Sample Type
Both within the same organisms and between species there is variability in how readily HMW DNA can be extracted and how stable it is once extracted. DNA from liver, for example, is known to quickly degrade due to the enzymes that make a functioning liver, while DNA extracted from blood is typically more stable. Some plant species have phenolics and polysaccharides that interfere with extraction, and mollusks have high DNAase activity that makes it difficult to store DNA for any amount of time.
If you have a choice on the type of sample you use, a cell-dense tissue with minimal potential contaminants is your best bet. For vertebrates, this means using tissues like blood, brain, kidney, or muscle. For some invertebrates there may be a mucous membrane that inhibits the ability to obtain high-quality DNA, and you might want to consider an additional DNA cleanup step to rid the extraction of contaminants.
When working with small arthropods you can use an adult individual but may find that targeting pupae or larva are an easier DNA source than a tough exoskeleton-covered adult. When planning a fungus sequencing project, consider culturing the sample in order to acquire a single isolate/individual in the case of macroscopic organisms or an isogenic population in the case of microorganisms. And finally, for plants, it is recommended to obtain the youngest leaf/shoot tissue from an individual plant that has been dark treated (kept out of light) for 24-72 hours.
DNA Extraction Challenge #2: Sample Prep and Storage
The second consideration for HMW DNA extraction after you’ve decided what sample type to use is how you will treat that sample. Sequencing adheres to the “garbage in, garbage out” rule, therefore it’s prudent to take care when prepping your samples. In most cases, the freshest sample will work best, followed by samples flash frozen with liquid nitrogen and stored at -80°C. This is because as soon as a tissue is taken from its living organism it begins to release factors that degrade both DNA and RNA, making it a race against the clock to get the genetic material out intact.
“Aim for getting molecules as long as you can, as pure as you can, as fresh as you can.”
– Olga Pettersson, SciLifeLab at Uppsala University
Of course, we can’t always control how a sample is prepped or stored, and in those cases it’s generally worth a try to get the best DNA you can from any given sample. There are examples of ethanol stored samples providing sufficient quality DNA as well as museum specimens for amplicon sequencing. However you decide to prep and store your sample prior to DNA extraction, the main aim is to reduce the amount of time between sampling and stably storing your sample to reduce enzymatic degradation of the genetic material within.
DNA Extraction Challenge #3: Choosing the Right Method
The final piece of the puzzle when it comes to obtaining HMW DNA for a sequencing project is the method used for extraction. There is no shortage of kits, protocols, and tutorials for DNA extraction, and after spending years trying to find the best one-size-fits-all extraction method for various sample types, we are fairly confident one doesn’t exist! However, there are some approaches that consistently produce plentiful HMW DNA that can be binned by sample type.
In general, “old school” methods using chemicals commonly found in molecular biology labs perform fairly well. For example, phenol and chloroform extractions work well for many tissues, though the chemicals used are dangerous. The cetyl trimethylammonium bromide (CTAB) method for extraction of DNA from plants is also a fairly robust way to yield good DNA. And once you understand the chemistry of how DNA is liberated from cells via these methods, you can tailor the protocols to meet the needs of individual species.
If you’re in the market for a tailored protocol, we encourage you to check out Extract DNA for PacBio, where we have collected many protocols from published projects, organized by organism type. However, if you’re looking for an easy, all-in-one DNA extraction kit to get you started on your sequencing journey, there’s a few out there that have produced great DNA for HiFi sequencing, and are summarized in our DNA extraction technical note. If you are hoping to outsource this step to a DNA extraction lab, explore our Certified Service Providers, many of which offer DNA extraction as a service.
While there might not be a one-size-fits-all solution for extracting DNA, we hope our experience and those of our customers can help point you in the right direction for a successful HiFi sequencing project!
If you are ready to get started with sequencing or simply need help with choosing the best DNA extraction approach, connect with a PacBio Scientist.
Explore Other Posts in the Sequencing 101 Series:
If only we could track COVID-19 like we track the weather, with satellites and weather stations placed around the globe monitoring and sounding the alarm about potential storms, floods, droughts and other severe weather events.
A global pathogen surveillance network would save countless lives, and lessons learned from the current coronavirus pandemic could help make it possible, PacBio Chief Scientific Officer Jonas Korlach told Mendelspod host Theral Timpson (@theraltweet).
Korlach joined Brian Caveney, President and Chief Medical Officer of Labcorp, in a recent podcast to discuss SARS-CoV-2 viral surveillance and the trajectory of COVID research, vaccination and treatment.
PacBio has partnered with the national diagnostic testing company to support its large-scale SARS-CoV-2 testing, which has become part of the US Centers for Disease Control’s COVID19 genomic surveillance effort. Labcorp has sequenced thousands of samples from around the country on its fleet of Sequel II Systems, and worked closely with PacBio to develop a new HiFiViral SARS-CoV-2 Workflow protocol to enable any laboratory to rapidly and efficiently power viral mutation surveillance using PacBio’s HiFi sequencing.
While rapid COVID19 diagnostic testing is generally being done via PCR methods, there is still an important role for viral sequencing, Korlach said. PCR tests provide very limited information about genomic mutations and might not be able to identify which variant of the virus a person is infected with. HiFi sequencing on the PacBio systems can provide a highly detailed profile of the 30,000 base-pair long SARS-CoV-2 virus, including specific mutations and whether there are multiple subtypes of the virus in individual patients, which has been detected.
Labcorp is using both methods, Caveney said. The company has performed more than 38 million COVID-19 PCR tests, and sequenced more than 20,000 genomes with PacBio technology. Not only is the whole-genome sequencing useful in accelerating scientists’ understanding of the virus as it evolves, but it has helped Labcorp ensure its PCR tests continue to be sensitive to emerging variants and mutations.
“Our research and development team loves working with the PacBio equipment. They like the incredible ability to have high specificity with the long reads that we’re getting,” Caveney said. “It’s going to continue to be a very important research tool for both sides of the house — the diagnostic side, as well as the clinical research side — to make sure that the best medications, therapeutics and vaccines are coming to market.”
Another benefit of sequencing technology in the realm of infectious diseases is that it is a “universal measuring device,” Korlach said. Whether the pathogen is a virus or bacterium a DNA sequencer can detect either.
“It’s COVID today and the variants of COVID tomorrow, but what about all the other infectious disease agents that for many decades have cost millions of lives?” Korlach said. “We now have opportunities to tackle them a lot better than we have in the past, using the COVID pandemic as a blueprint.”
Caveney agreed. “We’re so focused on COVID, but 30, 40, 50,000 Americans die every year from influenza. And we now have learning from COVID that might help us bring that number down in the future. That would be a great win, in spite of the tragedy we just went through.”
So what would it take to create a global pan-pathogen surveillance network?
Collaboration, between and among scientific communities, public health agencies, and private companies, Caveney said. An international standardization of nomenclature is also high on his wishlist, “so that regardless of the instruments or the technology used to do the sequencing, it results in information that can be compared and assimilated in a way that all scientists and doctors know what to do with it.”
Continued investment, Korlach said. “Are we willing to keep investing and focusing on making that change permanent and applying it to other infectious diseases, of really building out a permanent and stable network where the routine medical care is going to shift from measuring a temperature and looking in your mouth to getting samples genomically tested within days? That is a future that I think is possible, and that I would like to be part of trying to do our little part to make that happen.”
To learn more about genome surveillance and the benefits of PacBio sequencing, explore our COVID-19 sequencing tools and resources
Today we’re pleased to announce the launch of a new HiFi Sequencing workflow along with a software update for the Sequel II and Sequel IIe Systems that will increase the number of HiFi reads at or above 99.9% accuracy (QV30) for whole genome sequencing-based applications. Together, these advances will improve the quality of HiFi Sequencing while providing an efficient and scalable workflow for sequencing hundreds to thousands of whole human genomes per year on Sequel Systems.
This high-throughput sequencing and analysis workflow release includes a new HiFi library prep protocol offering a three-fold reduction in DNA input, enabling HiFi sequencing with limited sample quantities (neonatal blood, tissue biopsies, and cell lines).
Developed in collaboration with Children’s Mercy Kansas City, the release supports the adoption of HiFi reads for comprehensive variant detection to better understand the genetic causes of rare and inherited diseases. In a statement announcing the release, Emily Farrow, Director of Lab Operations at Children’s Mercy Research Institute, said: “This new workflow provides efficiency in our lab where now two research scientists can comfortably produce one thousand HiFi libraries a year, with the hope of doubling the throughput for library prep by automated liquid handling currently tested in the laboratory.”
The release also features new enabling workflows for variant calling and analysis of the SARS-CoV-2 genome in combination with the recently released high-throughput COVID sequencing protocol developed in partnership with Labcorp.
Jasmine Pritchard, our Vice President of Product Marketing, said, “We see building enthusiasm in the market for HiFi sequencing and this new release demonstrates our commitment to continuously improving our already industry-leading accuracy and key aspects of the workflow. Our team is focused on delivering advancements across the full spectrum of our portfolio, from sample preparation to downstream analysis.”
The HiFi Sequencing and Software v10.1 Release is available to order today and includes the following features:
- New Consumables: SMRTbell Enzyme Clean Up Kit 2.0, Sequel II Primer v5, Polymerase Binding Kit 2.2
- HiFi Protocol: Updated HiFi Express protocol enabling reduced DNA input
- Sequel II ICS v10.1: On-instrument workflow improvements that simplify run set up, especially for multiplexed applications
- SMRT Link v10.1: Updates for Adaptive Loading, our new HiFiViral for SARS-CoV-2 analysis application, and improved Iso-Seq Analysis for multiplexed samples
We also invite you to watch our on-demand Rare Disease Week event to hear how scientists are using HiFi sequencing to help identify causative variants and increase solve rates in rare disease research.
Ready to get started with HiFi sequencing? Connect with a PacBio scientist for a free project or instrument consultation.
By Jonas Korlach, Chief Scientific Officer
Grapy dusks over tangerine fields. Potato-patch fog over beds of coral. Mountains, glaciers, forests, deserts, fertile farmland and seas with both Arctic and tropical biomes.
One of the most geographically and biologically diverse states, California is home to both the highest (Mount Whitney) and lowest (Death Valley) points in the 48 contiguous states, as well as to some of the world’s most exceptional trees — the tallest (coast redwood), most massive (Giant Sequoia), and oldest (bristlecone pine).
At PacBio, we are extremely fortunate to have this biodiversity in our back yard — almost literally. We didn’t have to travel far to take samples of the giant California redwood as part of a personal project to sequence its gigantic genome and transcriptome.
It’s one of the reasons we are excited to work with the California Conservation Genomics Project, a collaboration of scientists across the state that has selected more than 100 threatened, endangered or otherwise valuable species sampled from the full array of California ecosystems for HiFi sequencing and assembly.
The purpose of the $10 million state-funded project is to capture the genetic variation that exists across each species’ habitat, with the ultimate objective of informing smarter development and more effective conservation.
How can genetics inform conservation? More biodiversity means more resilient ecosystems, and conservationists have long focused on preserving habitats and studying the roles of species within ecosystems. But they are now recognizing the importance genetic variation can play on long-term survival of a species.
Populations with high genetic diversity are more likely to contain individuals with a genetic makeup that allows them to survive new environmental pressures. Populations with low genetic diversity might not even survive the next big threat, so it is crucial to identify individuals with genetic variation in order to conserve the species’ ability to survive and evolve.
Threats to one population can threaten others, including ours. A collapsing ecosystem affects all those species who rely on it. So preserving biodiversity is also an exercise in self-preservation.
California will not be the only ecosystem to benefit from the CCGP research. In many ways, the state is a microcosm of what’s happening to biodiversity around the world. It faces threats similar to those faced by habitats on other continents: climate change, wildfires, droughts, and an ever-expanding population that encroaches onto formerly wild lands.
Its efforts will be boosted by other international initiatives, such as the United Kingdom’s Darwin Tree of Life Project, Australia’s Oz Mammals Genomics Initiative, the Vertebrate Genomes Project and The Earth BioGenome Project, whose ambitious goal is to sequence the DNA of 1.5 million species by 2030. We’re proud that PacBio technology is being used in all of these projects. You can learn more about the biodiversity initiatives PacBio sequencing is supporting in my recent presentation at the Senckenberg Biodiversity Genomics Symposium.
While COVID-19 has focused the attention of the scientific community — including our own — on pathogen detection, surveillance and drug development, lockdown has also spurred a renewed appreciation of nature. How many of us have sought solace in a temple of trees — in some cases, amongst towering columns of sequoias older than the Parthenon?
On this Earth Day, I urge all of you to do your part to “Restore our Earth,” whether that be committing to a home conservation project, or supporting an international one. At PacBio, we will be participating in public awareness campaigns and contributing our time and expertise in support of these important biodiversity initiatives. Let’s make every day Earth Day.