X

Quality Statement

Pacific Biosciences is committed to providing high-quality products that meet customer expectations and comply with regulations. We will achieve these goals by adhering to and maintaining an effective quality-management system designed to ensure product quality, performance, and safety.

X

Image Use Agreement

By downloading, copying, or making any use of the images located on this website (“Site”) you acknowledge that you have read and understand, and agree to, the terms of this Image Usage Agreement, as well as the terms provided on the Legal Notices webpage, which together govern your use of the images as provided below. If you do not agree to such terms, do not download, copy or use the images in any way, unless you have written permission signed by an authorized Pacific Biosciences representative.

Subject to the terms of this Agreement and the terms provided on the Legal Notices webpage (to the extent they do not conflict with the terms of this Agreement), you may use the images on the Site solely for (a) editorial use by press and/or industry analysts, (b) in connection with a normal, peer-reviewed, scientific publication, book or presentation, or the like. You may not alter or modify any image, in whole or in part, for any reason. You may not use any image in a manner that misrepresents the associated Pacific Biosciences product, service or technology or any associated characteristics, data, or properties thereof. You also may not use any image in a manner that denotes some representation or warranty (express, implied or statutory) from Pacific Biosciences of the product, service or technology. The rights granted by this Agreement are personal to you and are not transferable by you to another party.

You, and not Pacific Biosciences, are responsible for your use of the images. You acknowledge and agree that any misuse of the images or breach of this Agreement will cause Pacific Biosciences irreparable harm. Pacific Biosciences is either an owner or licensee of the image, and not an agent for the owner. You agree to give Pacific Biosciences a credit line as follows: "Courtesy of Pacific Biosciences of California, Inc., Menlo Park, CA, USA" and also include any other credits or acknowledgments noted by Pacific Biosciences. You must include any copyright notice originally included with the images on all copies.

IMAGES ARE PROVIDED BY Pacific Biosciences ON AN "AS-IS" BASIS. Pacific Biosciences DISCLAIMS ALL REPRESENTATIONS AND WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, NON-INFRINGEMENT, OWNERSHIP, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL Pacific Biosciences BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES OF ANY KIND WHATSOEVER WITH RESPECT TO THE IMAGES.

You agree that Pacific Biosciences may terminate your access to and use of the images located on the PacificBiosciences.com website at any time and without prior notice, if it considers you to have violated any of the terms of this Image Use Agreement. You agree to indemnify, defend and hold harmless Pacific Biosciences, its officers, directors, employees, agents, licensors, suppliers and any third party information providers to the Site from and against all losses, expenses, damages and costs, including reasonable attorneys' fees, resulting from any violation by you of the terms of this Image Use Agreement or Pacific Biosciences' termination of your access to or use of the Site. Termination will not affect Pacific Biosciences' rights or your obligations which accrued before the termination.

I have read and understand, and agree to, the Image Usage Agreement.

I disagree and would like to return to the Pacific Biosciences home page.

Pacific Biosciences
Contact:

PacBio blog

This blog features voices from PacBio — and our partners and colleagues — discussing the latest research, publications, and updates about SMRT Sequencing. Check back regularly or sign up to have our blog posts delivered directly to your inbox.

Search PacBio’s Blog

Wednesday, November 25, 2020

Better Apple Pies: HiFi Reads are a Perfect Recipe for High-Quality Apple Genome and Pangenome Assemblies

Zhangjun Fei inspects a Mutsu apple at Indian Creek Farm in Ithaca, NY. Image credit: Boyce Thompson Institute

Scientists at the Boyce Thompson Institute, Cornell University and the USDA Agricultural Research Service have reported significant progress in understanding the genomic features of domestic and wild apples. They used HiFi reads, highly accurate long reads, generated by the Sequel II System to build phased, diploid genome assemblies, as well as apple pangenomes to represent more of the remarkable genetic diversity in this lineage and better characterize its historic domestication.

The paper, published in Nature Genetics, comes from lead authors Xuepeng Sun (@XuepengBio), Chen Jiao, and Heidi Schwaninger; senior author Zhangjun Fei (@fei_lab), and collaborators.

We asked Fei about the highlights of the team’s efforts, and here’s how he summed it up: “We assembled phased diploid genomes of modern apple (Malus domestica) and the two major wild progenitors, M. sieversii and M. sylvestris using PacBio HiFi reads and Illumina short reads, and constructed pan-genomes of the three species. We inferred the genetic contributions of the two wild progenitors to the cultivated apple, and identified genome regions under selection during apple domestication and associated with important traits such as fruit size, texture and taste.

The team focused on the tasty Gala apple, knowing that producing an accurate genome assembly would require more than short-read sequencing data.

Most crop plants have complex genomes characterized by large size, high heterozygosity level and polyploidy,” they write in the paper.The apple genome is highly heterozygous, posing a major challenge for earlier genome assemblies.”

To address those challenges, the scientists incorporated HiFi reads into their strategy, sequencing the Gala apple and its wild progenitors at coverages ranging from 37-fold to 81-fold. These HiFi reads were then assembled using hifiasm and HiCanu, respectively (read more about these and other options for HiFi assemblers in this blog post). Those results were merged with orthogonal data sets to create diploid genomes for each of the three apples, with final assemblies reaching about 1.3 Gb.

Despite high heterozygosity rates (0.85–1.28%), all assemblies showed high contiguity, with the scaffold N50 of 3.3–4.3 Mb in diploid assemblies and 16.8–35.7 Mb in haploid consensuses,” they add.

The extremely high quality of the final assemblies allowed the scientists to identify an error in previously published apple genomes associated with a 5 Mb inversion on Chromosome 1.

But the team also wanted to go beyond just one high-quality assembly for the Gala apple, pointing out that “a single reference genome can by no means represent a whole population.” To that end, they constructed a pangenome for each of the three apple types, using 91 accessions to capture natural genetic diversity. Through this work, they added between 89 Mb and 212 Mb of novel sequence data to each genome, covering thousands of new genes.

Unlike annual crops such as the tomato, the pan-genome size of the cultivated apple is larger than that of wild progenitors, possibly due to the outcrossing nature and extensive introgression from wild species,” Sun et al. write. This distinctive feature suggests that introgression of new genes/alleles is possibly a hallmark of crops domesticated through hybridization.

One of the most important motivations for this study was to support apple breeding programs through a deeper understanding of trait variability.

Traits introgressed in the hybrid are often not fixed and could be lost when propagated by seeds,” the authors note. Understanding of the molecular basis of trait variability, which requires the knowledge of the diploid alleles, is critical for fixation of desirable traits in apple breeding.”

See additional examples of the use of SMRT Sequencing for the generation of pangenomes:

Read More »

Tuesday, November 17, 2020

Scientists Pinpoint Pathogenic Inversion in Intellectual Disability Case Using HiFi Sequencing

Scientists at Yokohama City University Graduate School of Medicine and Osaka Women’s and Children’s Hospital have discovered a novel pathogenic variant associated with intellectual disability. They made the discovery using HiFi sequencing after previous short-read investigations failed to produce an answer.

In the journal Genomics, the team reports the case of 12-year-old monozygotic twin girls who exhibited developmental delays, severely drooping eyelids, and seizures since the age of 5 months. Clinical symptoms matched Dravet syndrome, but no molecular evidence was available to confirm that diagnosis. Their case had previously been analyzed with short-read exome sequencing, but no pathogenic variants were uncovered. Lead author Takeshi Mizuguchi, senior author Naomichi Matsumoto, and collaborators turned to HiFi sequencing and the Sequel II System “to search for variants that are unrecognized by exome sequencing,” they write.

While intellectual disability (ID) has been linked to variants in more than 500 genes, even the best analytical methods have a diagnostic success rate of less than 30%. “There are still many cases for which no molecular diagnosis has been possible,” the authors note. “Therefore, it is important to determine the molecular genetics of unsolved ID cases using new technologies.”

The scientists sequenced 15 kb size-selected libraries for one of the twins and both parents to generate highly accurate (>99% or Q20) long reads, known as HiFi reads. Next, the team used pbsv to call structural variants, and Google’s DeepVariant to call small variants and indels. This process highlighted hundreds of deletions, insertions, and duplications, plus seven inversions, in the twin’s genome that were potential de novo structural variants. “A 12-kb inversion disrupting the coding sequence of Bromodomain and PHD Finger containing 1 … immediately drew our attention,” the authors report, because variants in this region had been linked to an intellectual disability syndrome consistent with the twins’ symptoms. “Among the 16 possible de novo [structural variant] calls affecting RefSeq gene exons, no other genes were linked to an OMIM autosomal dominant disease entry,” they add.

 

HiFi sequencing of a trio identifies a pathogenic heterozygous 12 kb de novo inversion that disrupts the gene BRPF1. Single-nucleotide variants (marked with “*”) show that the inversion occurred on the maternal allele #3.

The 12 kb copy-neutral inversion was confirmed with Southern blot, which also showed that both parents and an unaffected older brother lacked the inversion. A breakpoint analysis found that “the two breakpoint junctions identified by Sanger sequencing and the pbsv inversion call were identical,” the team notes, “demonstrating the accuracy of HiFi long-read analysis.” The scientists also point out that not only was the inversion missed by exome sequencing due to its copy-neutral status and repetitive flanking sequence, but it also would have been missed by traditional chromosomal analysis, which has a lower limit of detection of 10 Mb. Finally, using the trio data with haplotype phasing, the team discovered that the inversion was a de novo variant on the maternally transmitted chromosome.

“Importantly, the current study demonstrates that inversions can now be accessed using an ‘unbiased-genomic’ strategy with no prior knowledge,” the authors write. “This state-of-the-art technology is advantageous for elucidating hitherto inaccessible genomic changes.”

 

To learn more, explore SMRT Sequencing workflows and additional resources on comprehensive variant detection and structural variant detection.

Read More »

Tuesday, November 10, 2020

Secrets to Longevity Explored in de novo Genome of 115-Year-Old Woman

Hendrikje van Andel-Schipper, at age 108

A new publication from scientists in The Netherlands and Belgium offers tantalizing insights that may shed light on age-related neurodegenerative disorders. The team used SMRT Sequencing to produce a de novo diploid assembly of the genome of a Dutch woman named Hendrikje van Andel-Schipper, who died at the age of 115 with no signs of cognitive decline, and then performed a detailed analysis of variants detected. The data are publicly available to the scientific community.

The paper, released in Translational Psychiatry, comes from lead author Jasper Linthorst and senior author Henne Holstege (@HolstegeHenne) at Amsterdam Neuroscience and their collaborators. They aimed to identify structural variants (SVs) that could be associated with the onset of neurological disorders; for this, they performed a comparison between several previously available human genome assemblies which included the centenarian genome assembly.

The team chose long-read PacBio sequencing technology because they determined that “due to their repetitive nature, [SVs] are currently underexplored in short-read whole genome sequencing approaches,” they write. Repetitive regions, particularly repeat expansions that tend to grow larger over generations, have been shown to be pathogenic for a number of neurological diseases. “Using common sequencing approaches, the assessment of large repetitive regions is difficult because short 100-150 bp sequence-reads do not span the entire structural variant,” the authors report. “The solution to this problem is to generate longer sequencing reads.”

For this project, the scientists generated a de novo, phased genome assembly for the 115-year-old woman, which they refer to as W115. This was based on sequencing genomic DNA from three tissues and relied on FALCON-Unzip to create the diploid assembly of about 2.82 Gb. This information was compared to two haploid assemblies and the latest human reference genome to search for SVs of 50 bp or longer.

The scientists used a graph-based multi-genome aligner called REVEAL and found a total of 31,680 SVs. Nearly 70% were classified as variable number tandem repeats (VNTRs). “Interestingly, we observed that VNTRs in the subtelomeric regions were composed of longer repeat subunits than VNTRs outside the subtelomeric regions, and that they had a higher GC-content,” they report. Expanded VNTRs have been linked to faulty gene transcription. “The genes that contained most VNTRs, of which PTPRN2 and DLGAP2 are the most prominent examples, were found to be predominantly expressed in the brain and associated with a wide variety of neurological disorders,” the scientists add.

Henne Holstege, Amsterdam Neuroscience

In addition, the team analyzed the list of structural variants to see how SMRT Sequencing had made a difference in detection. Using short-read data for the W115 genome only, they found just 5,826 SVs. About 83% of the SVs — that’s more than 18,000 variants — found in the PacBio assembly “were uniquely identified through long-read sequencing,” the scientists note.

The sequence data for this project was produced on a PacBio RS II system, but Holstege and her team have already acquired a Sequel II System for the next phase of this effort. That will involve a large study encompassing at least 150 cognitively healthy centenarians and 150 individuals with Alzheimer’s disease, with the goal of identifying VNTRs that have significantly different lengths between the two groups. Holstege and her team will be generating HiFi reads and they expect to cover each genome in the study with a single SMRT Cell. “We want to know what about these individuals makes them so special,” she told us.

 

Explore workflows and additional resources on comprehensive variant detection or structural variant detection.

Read More »

Thursday, October 29, 2020

Breast Cancer Research Legend Mary-Claire King Identifies New Pathogenic Mutation with HiFi Sequencing

Mary-Claire King

It’s Breast Cancer Awareness Month, and we can’t think of a better way to celebrate than to honor the passionate scientist who has perhaps single-handedly done more to advance breast cancer research than anyone else alive: Mary-Claire King, discoverer of the BRCA1 and BRCA2 genes. In recognition of her lifelong contributions, King was just awarded the prestigious William Allen Award, the top prize presented annually by the American Society of Human Genetics to recognize substantial and far-reaching scientific contributions to human genetics, carried out over a sustained period of scientific inquiry and productivity.

In a recent publication in the Journal of Medical Genetics, King and her collaborators at the University of Washington combined CRISPR-Cas9 targeting with HiFi sequencing to reveal novel and biologically relevant mutations in the BRCA1 gene.

The effort was driven by a need to better characterize the well-known BRCA1 and BRCA2 genes in families with hereditary breast cancer. Short-read sequencing “is of limited use for identifying complex insertions and deletions and other structural rearrangements,” the scientists note. “The BRCA1 genomic region is particularly challenging for short-read sequencing. It is composed of 42% Alu repeats, the second highest proportion in the genome, and a 30 kb tandem segmental duplication spanning its promoter and first two exons.” To expand the clinical utility of information about these genes in the future, much research remains to be done to characterize the many variants missed by short reads.

For this study, scientists aimed to sequence the BRCA1 and BRCA2 genes from individuals representing 19 families with a history of early-onset breast cancer. All of these individuals had previously had these genes analyzed with gene panels and whole exome sequencing, but no pathogenic mutations were found that explained the early onset breast cancer susceptibility.

To target the two genes of interest, the team used the HLS-CATCH CRISPR-based targeting method from Sage Science, extracting 200 kb of high molecular weight libraries ideal for use with PacBio sequencing. HiFi sequencing was performed on the Sequel System, with average genomic fragment length of about 10,000 bases to fully cover the two BRCA loci, including non-coding elements.

In one case, this approach unlocked a novel variant to explain the family’s history of cancer. “We identified an intronic SINE-VNTR-Alu retrotransposon insertion that led to the creation of a pseudoexon in the BRCA1 message and introduced a premature truncation,” the scientists report. The retrotransposon was nearly 3 kb long. “Multiple long reads included all elements of the mutation and of wild-type flanking BRCA1 intronic sequence, so that the mutation’s position and the sequence were clear,” the authors note, adding that the variant segregated with breast cancer throughout the family. After identifying this tough-to-find type of variant, the authors confirmed that the intronic repeat element can affect the final BRCA1 message by sequencing cDNAs from matching patient cells.

Based on these findings, the team suggests that there may be many other pathogenic complex structural variants. “It is possible, even likely, that complex mutations are common at tumour suppressor genes,” they write. “We suggest that complex mutations have thus far been rarely encountered, because they are difficult to detect with existing approaches.”

King and her collaborators believe the approach they used will be important for continuing to uncover these variants. “The genomic approach described here, integrating CRISPR–Cas9 excision of critical loci with long-read sequencing, yields complete sequence of targeted loci and thus can detect all classes of complex non-coding structural variants,” they report. “This combination of CRISPR–Cas9 excision and long-read sequencing reveals a class of complex, damaging and otherwise cryptic mutations that may be particularly frequent in r suppressor genes replete with intronic repeats.”

 

Listen to King share the emotional and humorous story of the events leading to the funding of the project that resulted in the discovery of the BRCA1 gene – a true testament to her persistence and the constant challenge of balancing career and family, with a cameo from Joe DiMaggio!

Read More »

Monday, October 26, 2020

Egyptian Genome Added to Growing List of Population-Specific Reference Genomes

We’re excited to report that another team has used PacBio long-read sequencing to produce a population-specific reference genome — this time an Egyptian genome that should prove valuable for boosting precision medicine for people of North African ancestry.

Lead author Inken Wohlers, senior authors Hauke Busch (@BuschLab) and Saleh Ibrahim, and their collaborators at the University of Lübeck, Mansoura University, and other institutions report their results in Nature Communications. The need for a population-specific reference was clear: the authors note that “only 2% of individuals included in [genome-wide association studies] are of African ancestry” but “genetic disease risk may differ [between populations], especially for individuals of African ancestry.” The new assembly will serve as a foundation for more accurate interpretation of genetic risk in this population in future studies.

“With the advent of personal genomics, population-based genetics as part of an individual’s genome is indispensable for precision medicine,” the scientists note. Without a reference-grade assembly to use for comparison, people of African descent — particularly those with North African ancestry — are at risk of continued health disparities. To address this issue, the team generated a complete genome assembly of an Egyptian male based on SMRT Sequencing.

A high-quality de novo assembly of one individual was combined with population-level sequencing of 101 individuals to characterize variation in the Egyptian population. (Wohlers, I et al. 2020)

The “EGYPT” assembly spans 2.8 Gb and includes 20 Mb of novel sequence not seen in GRCh38. The authors report that the assembly is “high quality… comparable to the publicly available assembly AK1 of a Korean individual and the assembly of a Yoruba individual.”

Wohlers et al. also sequenced 110 Egyptian individuals with short reads, which were compared to the reference genome to identify population-specific variation. They called nearly 20 million single nucleotide variants and small indels, and more than 120,000 structural variants. The authors explain that it is key to understand variation as “differences in allele frequencies and linkage disequilibrium between Egyptians and Europeans may compromise the transferability of European ancestry-based genetic disease risk and polygenic scores, substantiating the need for multi-ethnic genome references,” the scientists write.

“The Egyptian genome reference will be a valuable resource for precision medicine,” the team adds. “The wealth of information it provides can be immediately utilized to study in-depth personal genomics and common Egyptian genetics and its impact on molecular phenotypes and disease.”

For more information about this project and its implications for improving genomic analysis, don’t miss this webinar and this poster from the authors.

 

SMRT Sequencing is being used to develop population-specific reference genomes as part of several international research efforts. Learn more about these projects and explore detailed assembly information in our interactive map.

 

Read More »

Thursday, October 22, 2020

PacBio and Invitae Team Up to Develop Whole Genome Sequencing-Based Assays for Pediatric Epilepsy Diagnostics

We’re excited to announce a research collaboration with Invitae focused on the investigation of clinically relevant molecular targets for use in the development of advanced diagnostic testing for epilepsy. To support this collaboration, Invitae is expanding its PacBio sequencing capacity to meet the growing demand for clinical applications dependent on highly accurate genomic information.

More than half of epilepsies can be traced to a genetic cause. When a child presents with seizures, genetic testing can help identify more than 100 underlying, often rare conditions. Early genetic testing may be the most cost-effective, direct, and accurate diagnostic tool for children, shortening lengthy diagnostic odysseys. Delays in diagnosis can be devastating for children, as some genetic epilepsies are neurodegenerative and early symptoms may be subtle and easy to misdiagnose.

The Behind the Seizure program is a prominent collaborative program established by BioMarin and Invitae that was developed to provide faster diagnosis for young children with epilepsy in many regions around the world. Participants in the Behind the Seizure program are diagnosed one to two years sooner than reported averages.

The first phase of our research collaboration is focused on a whole genome sequencing study of a large pediatric epilepsy patient cohort derived from the Behind the Seizure program. HiFi sequencing will be performed to generate comprehensive variant profiles used to investigate the genetic etiology of epilepsy. The research is intended to accelerate Invitae’s development of assays to help patients who have been unable to get a diagnosis with conventional short-read sequencing technologies and facilitate improved treatment options based on specific genetic targets.

In a statement announcing this news, Invitae Chief Medical Officer Robert Nussbaum said: “Through this research collaboration with PacBio, Invitae aims to develop innovative methods that will provide more accurate answers to individuals living with epilepsy and their healthcare providers.”

Our CEO Christian Henry added: “We are honored to partner with Invitae, a recognized leader in genetics, to co-develop methods that have the potential to support earlier genetic testing and intervention to aid treatment selection for millions of people living with epilepsy worldwide. Working with leading organizations such as Invitae is an important part of our strategy to accelerate the use of our highly accurate long-read sequencing platform in large-scale whole genome sequencing initiatives.”

 

Learn more about the benefits of PacBio whole genome sequencing and discover how highly accurate long reads can advance your neuroscience research.

 

Read More »

Wednesday, October 21, 2020

The HiFi Sequencing Advantage for Metagenome Assembly

Assembly and binning of metagenome data are the first steps in many metagenomics analysis pipelines, and with good reason. Metagenome assembled genomes (MAGs) and circularized MAGs (CMAGs) allow recovery of complete genes and operons, thereby improving predictions of metabolic capacities. MAGs also provide information about gene synteny and enable better taxonomic profiling. However, as discussed in a recent review by Chen et. al. draft MAGs with poor completeness or high contamination can lead to incorrect conclusions.

One way to improve assembly completeness and contiguity is to use long-read sequencing. However, not all long reads are the same. Did you know that once read lengths are longer than most of the repeats in a genome or metagenome, incremental gains in raw read accuracy improve assemblies faster than higher coverage or even large gains in read length?

Figure 1: The need for error correction presents unique challenges for metagenome assembly, where the error rate of noisy long reads can exceed the true differences between closely related community members.

One of the main hurdles in metagenome assembly is the presence of multiple closely related strains and species in the same sample, which leads to tangled assembly graphs. While long reads are helpful in resolving these, if the difference between two bacterial species (often defined as 3%) is less than the raw error rate of your sequencing data, overlap assembly remains problematic. This is because with noisy long reads, assembly is typically preceded by an error-correction step where the raw reads are mapped against each other to produce high accuracy consensus reads.

However, with metagenome data, this has the side-effect of collapsing and averaging reads that may actually be derived from different species. The ability to distinguish reads from closely related species or strains can be effectively erased during this first step, and the purity of the resulting contigs, the completeness of the MAGs, and the total size of the metagenome assembly can all be compromised. Read on for a detailed discussion and examples of how differences in read quality impact MAG assembly.

Higher read accuracy drives assembly quality

To understand how incremental changes in accuracy and differences in coverage affect metagenome assembly quality, we generated model metagenomics datasets with community member abundances that reflect a real fecal microbiome, drawing on references from Zou, et al. and the ‘Badread’ long read simulator (Wick, 2019). Noisy long reads were simulated from 160 microbial reference genomes with accuracy modes between 87.5% and 97.5%, and HiFi reads were modeled using a typical accuracy distribution (>99%) for 8 kb -10 kb reads, an insert size commonly achievable for long read metagenome sequencing. The number of bases in each dataset was modeled after conservative Sequel II System yield of HiFi data from a metagenomics run (~20 Gb) and ONT PromethION (60 Gb) reported outputs (Shafin, 2020). The resulting model datasets were assembled with Canu 2.0, using the recommended parameters for ONT and HiFi datatypes.

Figure 2 In modelled metagenome data, raw reads with higher read accuracy generate contigs with higher purity.

With Canu, it is possible to trace which reads were used to generate each contig in the assembly, and we used this capability to calculate the purity of each contig. Specifically, we determined what fraction of reads did not originate from the reference genome that contributed the majority of reads used to assemble that contig.

As shown in Figure 2, there are limited gains in contig purity even as accuracy changes from 85% to 97.5%. However, there is a sharp transition in contig purity when read accuracy surpasses 99%, exceeding the inter-species similarity commonly seen in a complex fecal community.

 

 

 

High-error reads compromise the assembly of low abundance species

Another challenge with using self-error correction ahead of metagenome assembly relates to the uneven proportion of different species in the data. Error correction typically requires ~30-fold coverage to be effective. However, in metagenomes, it is common for species to be present at a wide range of relative abundances. This means that even when there is enough coverage of highly abundant species for error correction, reads from lower abundance species may fail the initial error correction step and be omitted from the assembly. In the example of our model data set, even with three times more raw data, the 87.5% accuracy mode dataset assembles to less than half of the expected assembly size, with contigs that are significantly shorter than with more accurate reads. When the data accuracy surpasses the threshold of microbial interspecies differences, contiguity and assembly size leap dramatically despite lower sample coverage.

An example of how this limitation plays out in a real-world sample can be seen in a cow rumen assembly that used self-corrected PacBio CLR reads with ~89% median accuracy (Bickhart, 2019). While the PacBio CLR assembly had higher contiguity than the Illumina assembly despite a 3-fold lower depth of sequencing, the Illumina assembly had superior completeness.

Closer inspection of the PacBio CLR data revealed that “the correction step removed 10% of the total reads for being singleton observations (zero overlaps with any other read) and trimmed the ends of 26% of the reads for having fewer than 2 overlaps.” The authors further noted that “this may have also impacted the assembly of low abundance or highly complex genomes in the sample by removing rare observations of DNA sequence”.

Figure 5. The proportion of same-sample Illumina reads that map to the cow rumen CLR assembly versus a sheep fecal HiFi assembly. Since HiFi reads do not require an error correction step, more data from low abundance species is available for the assembly step. (Bickhart, D., SMRT Leiden 2020 presentation)

In contrast, since HiFi reads do not need error correction, all the data, including observations from low abundance species, can be used in the assembly step. Accordingly, a more recent assembly of a sheep fecal sample that used HiFi data had significantly improved performance. In his SMRT Leiden talk, Derek Bickhart noted that while cow rumen and sheep fecal samples are different communities and therefore their assemblies are not an “apple to apples” comparison, the sheep fecal assembly, done with HiFi data, appears to have a significantly improved representation of low abundance species as gauged by the proportion of same-sample short read data that maps to the long read assembly.

One possible method for overcoming the long-read coverage bottleneck is to use short read data for error correction. However, this approach suffers from the same factors that limit short read metagenome assembly. Namely, short read data has GC bias and cannot be mapped uniquely to repetitive regions. Given that bacterial genomes can range from 13-75% GC, error correcting low accuracy long reads from all the species in a metagenome sample with short read data can be problematic.

 

The power of HiFi reads

With the unique combination of high accuracy and long read length, HiFi data shows promise for overcoming some of the longstanding challenges in metagenome assembly. Unlike noisy long reads, assembly of HiFi reads is unencumbered by an error correction step that can erase the variation needed to correctly assemble closely related species in complex communities and generate high quality MAGs and CMAGs. Furthermore, they show potential for improving the representation and contiguity of low abundance species in metagenome assemblies.

HiFi data has already been making waves in the world of large genome assembly, first at PAGXXVIII in January 2020 and more recently at the precision FDA Truth Challenge V2, which evaluated methods for variant calling in human genomes. We are excited to see what HiFi data will do for metagenome assembly as more researchers become aware of its potential.

 

Learn more about HiFi sequencing for metagenomics. To start planning your metagenome assembly experiment connect with a PacBio scientist.

References:

Chen L-X, et. al. (2020) Accurate and complete genomes from metagenomes. Genome Research 30:1-19.

Bickhart, D., et. al. (2019) Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation. Genome Biology 20:153.

Wick RR. (2019) Badread: simulation of error-prone long reads. Journal of Open Source Software. 4(36):1316.

Shafin, K., Pesout, T., Lorig-Roach, R. et al. (2020) Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat Biotechnol.

Zou, Y., Xue, W., Luo, G. et al. (2019) 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat Biotechnol 37, 179–185.

 

Read More »

Thursday, October 15, 2020

Pediatric Partnership Powered by PacBio Aims to Solve Difficult Rare Disease Cases

Kids have lots of questions. But even the world’s top scientists don’t have all the answers — especially when it comes to rare genetic disorders afflicting children.

Our HiFi reads, highly accurate long reads, generated by our Sequel II and new Sequel IIe Systems, are helping researchers uncover disease-causing genetic variants that had previously gone undetected by other technology, contributing to increased solve rates for rare diseases.

We’re particularly excited to see this technology applied to translational research in children. We will be collaborating with Children’s Mercy Kansas City as part of its Genomic Answers for Kids (GA4K) program, which aims to collect genomic data and health information for 30,000 children and their families over the next seven years, ultimately creating a database of nearly 100,000 genomes.

“We are delighted to be collaborating with the innovative scientists at PacBio as we bring their long-read sequencing data to bear on some of our most difficult cases of rare pediatric disease to give patients and families the answers they deserve,” said Tomi Pastinen, director of the Center for Pediatric Genomic Medicine at Children’s Mercy.

It is estimated that as many as 25 million Americans — approximately 1 in 13 people — are affected by a rare condition. Whole-genome and whole-exome sequencing is often employed to try to diagnose these conditions, but often this involves short-read sequencing, and causes are found in only ~25% to 50% of cases — leaving the majority of cases unsolved.

Hoping to overcome these odds, Children’s Mercy has recently invested in Sequel II Systems, with plans to use our Single Molecule, Real-Time (SMRT) Sequencing technology to generate HiFi reads to detect what the short-read methods might have missed. Early results are encouraging, and have already demonstrated increases in pathogenic variant and disease-gene discovery beyond what was possible with short-read methods.

The researchers will also be working with the Microsoft Genomics team to build Microsoft Azure cloud-based analysis solutions and a data repository for this unique dataset.

“The diagnosis journey for a child with a rare disease and their families can be long and often inconclusive. We believe the advancement of precision medicine with specialized technologies will be key to gaining a better understanding and early diagnosis of these debilitating and deadly diseases,” said Gregory Moore, corporate vice president, Microsoft Health.

We look forward to making a meaningful impact by increasing solve rates through this important partnership.

 

More information about how Children’s Mercy scientists are using HiFi sequencing will be presented in PacBio’s ancillary workshop Monday, October 26 from 1:00-2:00 pm ET during the American Society of Human Genetics (ASHG) Annual Meeting. Emily Farrow, Director of Laboratory Operations at the Genomic Medicine Center at Children’s Mercy, will give a talk entitled “Applications of Third Generation Sequencing in Unsolved Disease.” Free virtual event registration is available here.

 

See additional examples of the use of SMRT Sequencing in rare disease research and learn more about structural variant detection:

Read More »

Thursday, October 8, 2020

A Living Legacy of Microbiology Celebrates 100 Years

As the world faces an unprecedented pandemic caused by a novel coronavirus, the scientific spotlight has shone brightly on infectious disease research. And although interest in Public Health England’s (PHE) Culture Collections is often focused on its historical cultures, its relevance in our modern world has never seemed sharper.

The National Collection of Pathogenic Viruses (NCPV) has been helping scientists from around the world address the current history-making infectious disease event. It is also anticipating future outbreaks, and building collections of pathogenic viruses to aid research into potential threats to human health.

“The question of which virus will be next to make the jump from relative obscurity to frontpage news is important to ask, but difficult to answer,” wrote PHE Lead Virologist Barry Atkinson in an NCPV blog post. “One group has shown a propensity to cause large outbreaks after decades of apparent inactivity or low-level circulation – the arthropod-borne viruses (arboviruses).”

While viruses have dominated headlines in 2020, bacteria have also driven epidemics and outbreaks, whether through community spread, in hospitals, or in our food and water systems. In fact, the majority of the microorganisms in PHE’s culture collections are bacteria, dwarfing the number of viruses and fungi.

Currently celebrating its 100 year anniversary, the National Collection of Type Cultures (NCTC) continues to remain highly relevant, recently releasing new antimicrobial resistance reference strains and resources.

A look back at 150 years of advances and the milestones microbiologists have crossed, with many significant contributions from the NCTC (click for full size)

The historical collection of more than 6,000 expertly preserved and authenticated bacterial cultures has been at the forefront of advances in the field, implementing the latest, greatest new technology in order to provide the best, most comprehensive resources for microbiology laboratories in a range of different sectors and in research institutes worldwide.

Most recently, when the NCTC decided to expand their collection to include genome sequencing information, PacBio Single-Molecule Real-Time (SMRT) Sequencing was the technology of choice. Starting in 2014 and in coordination with the Wellcome Sanger Institute, NCTC created reference quality genomes for 3,000 bacterial strains. Professor Julian Parkhill, who initiated the project while he was at the Wellcome Sanger Institute, stated, “If you’re trying to generate reference genomes that are going to be valuable to as many people as possible, with as much information in them as possible, then Pacific Biosciences has the edge in terms of generating more complete data.”

Released four years later, the collection includes several of the most important known drug-resistant bacteria, such as tuberculosis and gonorrhoea, and some varieties of historical significance, such as a dysentery-causing Shigella flexneri isolated in 1915 from a soldier in the trenches of World War 1, and a sample from the nose of penicillin discoverer Alexander Fleming.

More than 60% of NCTC’s historic collection now has a closed, finished reference genome, assembled from PacBio sequencing.

Julie E. Russell

“If NCTC is to continue to supply relevant authentic bacteria for use in scientific studies, then the quality of our own characterization and authentication data must be outstanding,” said Julie E. Russell (@Julieru13), Head of Culture Collections at Public Health England.

“Combining sequences, strain metadata and links to other resources in the public domain will ensure that this e-resource provides a unique comprehensive source of data to underpin microbial research, and improve the provision of diagnostics and public health interventions for medically important bacteria and viruses.”

 

Learn more about the historical NCTC collection in this case study and microbial whole genome sequencing on PacBio systems.

Read More »

Monday, October 5, 2020

Meet the New Sequel IIe System: Delivering Fast & Affordable HiFi Sequencing

The new Sequel IIe System provides direct access to HiFi sequencing

Still soaring from the success of last year’s launch of the award-winning Sequel II System, we’re excited to announce the next evolution of the instrument: the Sequel IIe System.

This evolution includes increased computing power and advanced on-instrument data processing. This means the instrument can directly produce the widely coveted, highly accurate long reads, known as HiFi reads, that have made the original Sequel II System indispensable for many labs — and save users time and money in the process.

Just how much of an improvement does the new system represent? By completing all the primary data processing for HiFi reads on the instrument, the Sequel IIe System provides as much as a 90% reduction in file storage needs, and a 70% reduction in secondary analysis processing time.

Additionally, the release includes powerful new tools in SMRT Link v10.0 software to enable complete workflow integration on the AWS cloud, and a new Genome Assembly analysis application for generating reference-quality de novo assemblies from HiFi reads.

 

The Sequel IIe System provides the solid foundation scientists need to build their discoveries, with easy to interpret HiFi sequencing data and flexible compute options.

“HiFi reads allow the accurate and simultaneous detection of single nucleotide and structural variants, paving the way for advancements in human genetics and greatly expanding the utility of SMRT Sequencing, ” said Fritz Sedlazeck (@sedlazeck), Assistant Professor, Human Genome Sequencing Center at Baylor College of Medicine.

“Generating HiFi reads directly on the Sequel IIe System now has the potential to further accelerate cost-effective access to this information-rich sequencing data.”

HiFi sequencing has provided important data for a number of high-profile global research projects, including the Telomere-to-Telomere Consortium, Darwin Tree of Life, the Human Pangenome Reference Consortium, and the Solve-RD Project, among others. The precisionFDA Truth Challenge V2 evaluated methods for variant calling in human genomes and highlighted how approaches that use HiFi reads delivered the highest precision and recall in all categories including genome-wide, specifically in difficult-to-map regions, and in the major histocompatibility complex.

Our CEO, Christian Henry, noted: “The new Sequel IIe System represents the next advancement in our technology, and makes HiFi sequencing accessible to any project where high accuracy, long read lengths, and affordability matter.”

See how HiFi sequencing combines the best aspects of short reads and long reads into a single easy-to-use technology.

 

 

Want to learn more? Attend our workshop featuring the Sequel IIe System and HiFi sequencing applications for human biomedical research on Monday, October 26, from 10-11 a.m. PDT or visit the product page.

Want to discuss the benefits of HiFi sequencing and the Sequel IIe System for your research? Connect with a PacBio Scientist.

Read More »

Thursday, October 1, 2020

Webinar: Crops and Corvids get the Pangenome Treatment with HiFi Sequencing

Nearly gapless, reference-quality chromosome-level assemblies — in less than a day? Yes, it’s possible, thanks to the high accuracy and low computational needs of PacBio HiFi reads.

Kevin Fengler, computational genomics lead at Corteva Agriscience, welcomed watchers to the brave new world of the pangenome during the recent webinar, “Beyond a Single Reference Genome – The Advantages of Sequencing Multiple Individuals.”

We are now living in an era where you can generate a reference genome assembly that’s specific for each application or trait of interest, Fengler said.

Graphic alignment of dozens of genomes in a pangenome collection allows researchers to quickly identify novel variations

“Often we’re interested in getting the sequence of a single disease resistance gene or the sequence of a particular QTL, and we’ll do a whole genome just for that,” Fengler said. “It may seem like overkill, but we have found that the best approach — the fastest, easiest, simplest, most cost effective way to do that — is just to generate whole genome reference quality assemblies.”

Fengler cited several benefits of HiFi reads that have made this possible. Foremost among them were lower computational demands, and high accuracy with a low error rate, even with relatively “short” reads.

“I used to be a long-read junkie, always trying to get 50 kb, 60 kb reads. But with reads that are only 15 kb in length, we’re now able to achieve these highly contiguous, highly accurate assemblies,” Fengler said. “The HiFi reads are so accurate that you don’t need to do any additional polishing with Illumina, or even additional polishing with PacBio, which used to be a step.”

In many cases, Fengler said he has been about to assemble through the centromere.

“This is another cool thing that’s developed with HiFi. With our previous CLR assemblies, we never would have assembled through the centromere for plants, but now we are able to get a single scaffold per genome.”

Fengler emphasized that pangenome assemblies need to be robust, “because misassembly is not SV, and sequencing error is not variation.”

He shared several examples of crops that have received the pangenome treatment, including cotton, which, he noted, “would not be considered, historically, an easy genome to assemble by any means.”

“But here we’re getting single contigs in most cases for most of the chromosomes,” he said. “This is what you really need. This is the goal, this is what we’re trying to achieve.”

He also discussed two of the tools he uses to analyze the sequence diversity between the genomes and make it actionable, TagDots and PANDA.

Watch Fengler’s full presentation:

 

Crossing a continental crow divide

Pangenome collections are not only valuable for comparing commercial crop breeds for certain traits, they can also help answer questions about the evolution and population dynamics of non model species.

Matthias H. Weissensteiner (@MWeissensteiner), a postdoc at Pennsylvania State University, discussed his work studying structural variation among several songbird species in the genus Corvus, some of which were included in his recently published Nature Communications paper.

By sequencing dozens of birds across the Corvus genus, researchers were able to identify genetic variants likely to cause differentiations in plumage patterns

About 60 species of the genus display the typical all-black crow plumage pattern, but there are also a few black-and-grey and black-and-white forms. In Europe, there is a ‘crow divide,’ with all-black crows in the west, black-and-grey crows in the east, and a narrow hybrid zone in between.

“They look like two species, they behave like two species, but when we looked at the genetic differentiation based on single nucleotide changes, we found out that they are actually genetically more or less the same,” Weissensteiner said. “Only 83 nucleotides out of a genome of 1.3 billion base pairs are fixed, meaning that there are only about 80 differences which are diagnostic for these two crow populations.”

In order to uncover the secrets of their speciation, Weissensteiner and colleagues sequenced 33 crows and created a dataset comprising the full phylogenetic range of the genus.

For Weissensteiner, the value of long reads was clear: By enabling him to anchor his reads completely to the reference, he could more confidently capture the correct sequence and identify insertions, deletions, inversions and other variations.

“We combined different types of sequencing and mapping technologies and found that long-read sequencing in particular is able to reveal a stunning amount of genetic variation.”

Having assemblies from across the entire genus made filtering the data a bit easier, as well, Weissensteiner said. Because the researchers had large phylogenetic distances within their data, they were able to remove variants that seemed to be segregating across the clades.

“If you have a variant that is polymorphic within the crow clade and polymorphic within the jackdaw clade, it’s likely to be an error because over these large phylogenetic distances, there should not be any segregating variation,” he explained.

Once he had a reliable set of variants, Weissensteiner looked for causal mutations for plumage differentiation and identified the most promising candidate: A 2.25 kb LTR retrotransposon insertion located 20 kb upstream of the NDP gene.

Watch Weissensteiner’s full presentation:

 

Watch the entire webinar, including an introduction to HiFi reads by PacBio Sequencing Application Specialist Kristin Mars:

See additional examples of the use of SMRT Sequencing for the generation of pangenomes:

Read More »

Monday, September 21, 2020

Checkmate, Chromosome 8: The First End-to-End Sequence of a Human Autosome

Even in the field of genomics where new breakthroughs occur every few months, completion of the first-ever fully sequenced human autosome is a momentous achievement. Highly accurate, no gaps, no mis-joins — just chromosome 8 in all its glory. It’s a remarkable feat and we are honored that PacBio HiFi reads played a pivotal role in helping to achieve it.

The complete centromere sequence of chromosome 8 shows a diversity of satellite repeats and other abundant genomic repeats, now with near perfect base-level resolution from end to end. Logsdon, G et al. (2020)

This work is described in a preprint recently posted to bioRxiv from lead author Glennis Logsdon (@glennis_logsdon), senior author Evan Eichler, and their collaborators in the Telomere-to-Telomere (T2T) Consortium. It is part of the broader T2T initiative to sequence and assemble the first truly complete human genome and follows the earlier release of the fully sequenced X chromosome.

“Since the announcement of the sequencing of the human genome 20 years ago, human chromosomes have remained unfinished due to large regions of highly identical repeats located within centromeres, segmental duplication, and the acrocentric short arms of chromosomes,” the authors note. “The advent of long-read sequencing technologies and associated algorithms have now made it possible to systematically assemble these regions from native DNA for the first time.”

Chromosome 8 made an attractive target for the T2T’s first autosome due to its manageable centromere (previously estimated at 1.5 Mb to 2.2 Mb long). But the chromosome is also home to “one of the most structurally dynamic regions in the human genome—the β-defensin gene cluster located at 8p23.1—as well as a neocentromere located at 8q21.2, which have been largely unresolved for the last 20 years,” the scientists write. The β-defensin cluster plays a key role in innate immunity and structural variation in this region has long been implicated in human disease.

The new assembly, which addresses all five of the previously intractable gaps in the human reference genome, was built with a clever method using several data sets, including accurate long reads: “More than half of the PacBio HiFi data is contained in reads greater than 17.8 kbp, with a median accuracy exceeding 99.9%.” After a scaffolding step based on Oxford Nanopore reads, contigs assembled from PacBio HiFi reads were swapped in to provide the base-pair resolution. “We improved the base-pair accuracy of the sequence scaffolds by replacing the raw ONT sequence with several concordant PacBio HiFi contigs,” the team reports.

The complete chr8 sequence clocks in at 146 Mb and includes more than 3 Mb missing from GRCh38. As Logsdon et al. write, “The result is a whole-chromosome assembly with an estimated base-pair accuracy exceeding 99.99%.”

The scientists also tackled that persnickety β-defensin gene cluster, “which we resolved into a single 7.06 Mbp locus—substantially larger than the 4.56 Mbp region in the current human reference genome,” they note. Nearly all of that sequence data — 99.9934% of it, to be precise — came from HiFi reads. The complete centromere, meanwhile, accounted for 2.08 Mb.

With this beautiful assembly in hand, the T2T team took it out for a spin. First, they validated it with a host of orthogonal tools, such as optical mapping. Next, they generated HiFi data for the chromosome 8 orthologs in chimpanzee, macaque, and orangutan to compare the sequence data and reconstruct the evolutionary history of the human autosome. “Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved specifically in the great ape ancestor, and the centromeric region evolved with a layered symmetry,” the team writes. “We estimate that the mutation rate of centromeric satellite DNA is accelerated at least 2.2-fold, and this acceleration extends beyond the higher-order α-satellite into the flanking sequence.”

Finally, the researchers performed an analysis of full-length transcripts produced with the Iso-Seq method. That process identified “61 protein-coding and 33 noncoding loci that map better to this finished chromosome 8 sequence than to GRCh38, including the discovery of novel genes mapping to copy number polymorphic regions,” they report. Twelve of these new genes were uncovered in that tricky β-defensin locus alone.

A combination of HiFi genome assembly and RNA annotation with Iso-Seq data added these 12 new genes to the β-defensin (DEFB) region of chromosome 8. Logsdon, G et al. (2020)

For so many of us in the genomics community, this paper represents far more than the sequence of a single human chromosome. It’s a statement about what science can accomplish now, and where that may lead us in the years to come. As the authors summarized: “Now that complex regions such as these can be sequenced and assembled, it will be important to extend these analyses to other centromeres, multiple individuals, and additional species to understand their full impact with respect to genetic variation and evolution.”

You can hear more details from Logsdon directly at a free online conference co-hosted by the T2T Consortium and Human Pangenome Reference Consortium (HPRC) on September 22/23. Speakers will offer new insights on chromosome 8 and report on further T2T progress towards a complete human genome assembly. At the same event, the HPRC will present its complementary effort to sequence hundreds of human genomes to high quality. Presenters include: Karen Miga (@khmiga), Eric Green (@NHGRI_Director) Adam Phillippy (@aphillippy), Sergey Koren (@sergekoren), Sergey Nurk (@sergeynurk), Valerie Schneider (@dnadiver), Tina Graves-Lindsay, Arang Rhie (@ArangRhie), Mitchell R. Vollger (@mrvollger), Erich Jarvis (@erichjarvis), Mark Chaisson (@mjpchaisson), Mike Schatz (@mike_schatz), Heng Li (@lh3lh3) and many more. We’ll be glued to our computers for it and we hope you’ll have a chance to join as well!

Read More »

Wednesday, September 16, 2020

Easy and Affordable: Full-Length 16S HiFi Sequencing with PacBio Service Providers

Analysis of 16S ribosomal RNA has been used for phylogenetics and identifying prokaryotes for decades. But just as scientists have had to refine the Linnaean taxonomy system based on genomic discoveries, improvements in sequencing technology are changing 16S analysis best practices.

Several dominant microbial genera in Sakinaw Lake could only be resolved via Full Length 16S or were missed by V4 sequencing (gray boxes). Singer, E. et al. (2016)

Researchers at the Joint Genome Institute, for instance, conducted a detailed benchmarking study and found that traditional methods of 16S analysis — which look at just a piece of the gene — are less accurate than analysis based on full-length sequencing of the entire V1-V9 16S gene. The full-length 16S advantage was clear in their study of the metagenome of a meromictic lake, where partial 16S sequencing led to incorrect matches or failed to differentiate the phylogenies present at different lake depths. The scientists wrote, “A resurgence of [full-length] sequences used as ‘gold standards’ has the potential to yet again transform microbial community studies, increasing the accuracy of taxonomic assignments for known and novel branches in the tree of life on previously unobtainable scales.”

The color of each branch reflects the proportion of sequences within each clade that could not be identified to species level. Johnson, J.S., et al. (2019)

Similarly, in a recent webinar, George Weinstock (@geowei) at the Jackson Laboratory for Genomic Medicine noted that only full-length 16S sequencing can resolve all the bacterial clades commonly found in the human gut microbiome down to the species level. He said, “With V1-V9, almost all [sequences] could be accurately identified to the species level. With V4, more than half could not be identified at the species level, so you are sort of locked into the genus level or higher… The full-length sequences are definitely the gold standard, there is no question about it.”

Why does species resolution matter? Weinstock later explained using the example of 16S data from healthy stool donors. “Even though at the genus level there is a certain frequency of Bacteroides species for these samples … to do some statistical analysis based on their Bacteroides, you are going to miss a lot of important information, because the species are quite different.”

That’s where PacBio HiFi reads come in. By sequencing around and around the same molecule, HiFi sequencing produces long, highly accurate, single molecule consensus reads. At a microbiome meeting held last year at Cold Spring Harbor Laboratory, our own Meredith Ashby (@AshbyMere) presented a poster showing how HiFi reads provide both accurate and complete results for full-length 16S sequencing.

To make 16S HiFi sequencing easily accessible to PacBio users, we have developed a new and improved one-step PCR protocol and worked with several DNA sequencing service providers to add the application to their menu of services, for as little as $50 per sample. The new protocol reliably delivers more than 30,000 reads per barcode at 96-plex in a single Sequel II System run.

Here’s some information about a few of the service providers who worked with us to validate the new protocol:

 


DNA Services Lab
, University of Illinois at Urbana-Champaign

Scientists Mark Band and Alvaro Hernandez are part of a team with extensive experience processing samples types from customers all over the world, from those that have minimal amounts of DNA, to those that have strong PCR inhibitors, such as those from corals, lakes and soils. They have 192 barcodes for generation of full-length 16S amplicons. Typical turnaround time from sample receipt to data delivery is just two weeks.

 

Maryland Genomics, University of Maryland School of Medicine

Part of the Institute for Genome Sciences, Maryland Genomics is led by scientists who were part of the earliest genomic efforts and who pioneered the field of metagenomics. They have expertise in working with challenging samples, particularly for microbiome studies or metagenomic applications, and operate a dedicated Microbiome Services Laboratory that provides complete sample-to-results services for full-length 16S profiling.

 

Biomarker Technologies, Beijing

According to CTO Liu Min, Biomarker is particularly experienced with soil and water samples for 16S sequencing, among many other types. In addition, Biomarker uses an in-house developed concatenation step to link together multiple 16S full-length amplicons before library creation, allowing them to increase throughput and reduce sequencing costs by multiplexing as many as 700 samples per SMRT Cell 8M. A current promotion offers customers 5,000 HiFi full-length 16S reads per sample for as little as $40. Learn more (Chinese language) here and here.

 

The University of Delaware Sequencing and Genotyping Center and Shoreline Biome, Farmington, CT

Finally, for customers who prefer an all-in-one 16S solution, Shoreline Biome offers V1-V9 and StrainID solutions that include DNA extraction, amplification, and analysis. The University of Delaware Sequencing and Genotyping Center uses Shoreline Biome technology for 16S sequencing, particularly for clinical projects such as studies of microbial communities in medical settings.

Lab director Bruce Kingham tells us, “Using PacBio long reads to resolve the 16S gene is relatively new, and has been disruptive to the field of ribotyping. The layers of genetic detail that we can elucidate from full-length 16S data could not be grasped until it was performed at the current scale, and its use continues to grow at a rapid pace.”

Mark Driscoll, the CSO of Shoreline Biome adds “Bruce’s group recognized early that the additional resolution offered by Shoreline Biome’s 2500 bp StrainID amplicon is a powerful multiplier for researchers seeking strain-level resolution beyond what is possible with the 16S gene alone. Near-perfect, contiguous HiFi reads of StrainID amplicons covering the 16S, 23S, and variable spacer between the genes enable longitudinal tracking of strains in complex fecal microbiomes in humans and model organisms such as the mouse. Researchers seeking single clone 16S sequences from their archived strain banks have been able to pack hundreds of strains in a single run.”

 

Ready to start planning your full-length 16S experiment? Connect with a PacBio scientist.

 

Read More »

Tuesday, September 15, 2020

Now Available: Ultra-Low DNA Input Workflow for SMRT Sequencing

The SMRTbell gDNA Sample Amplification Kit enables whole genome amplification starting from as little as 5 ng of genomic DNA.

It’s one of the questions we hear most often from scientists working with small organisms: Is it possible to generate truly high-quality, long-read data from minuscule amounts of DNA? With our new kit for ultra-low DNA input projects, the answer is: Absolutely!

The new workflow dramatically reduces the requirements for DNA quantity. Now, scientists need only 5 ng of genomic DNA to kick off a SMRT Sequencing project — that’s less than 2% of the starting volume needed for our current low DNA input protocol. This opens up access to HiFi sequencing for researchers studying the tiny arthropod species that comprise much of the diversity of the tree of life. In addition, the new protocol enables comprehensive variant detection in input-limited human samples such as needle biopsies.

Ultra-low DNA input sample preparation relies on the SMRTbell gDNA Sample Amplification Kit (PN: 101-980-000), which uses PCR amplification to help users get enough material for sequencing. The kit contains enough reagents to process up to 18 samples and can be used for de novo genome assembly of arthropods with genomes no larger than 500 Mb or for human variant calling. Of course, if your sample quantity is not limited (> 5 μg of DNA is available), we encourage you to follow the standard HiFi protocol for best results.

Phlebotomus papatasi sandfly. Photo by Frank Collins/CDC.

To put it to the test, we used the ultra-low DNA input kit to sequence the sand fly (Phlebotomus papatasi), starting with just 5 ng of DNA from a single insect, which we sequenced on our Sequel II System to 55-fold coverage. We generated nearly 2 million HiFi reads with accuracy of at least Q20, producing nearly 25 Gb of HiFi reads from one SMRT Cell 8M. Mean read length was 12 kb, and mean read quality was 99.97%, or Q36. This tiny insect had a genome size of 363 Mb, and our assembly featured a contig N50 of more than a megabase.

PacBio customers have been utilizing the new protocol as well. At the Max Planck Institute, scientists performed ultra-low DNA input sequencing for Phyllotreta armoraciae, the horseradish flea beetle.

In addition, in their new preprint, the Max Planck researchers and collaborators used the ultra-low DNA input kit to sequence two species of springtail (Collembola). In the preprint, the authors stated, “Our study shows that it is possible to obtain high quality genomes from small, field-preserved sub-millimeter metazoans, thus making their vast diversity accessible to the fields of genomics.” To hear more details about research using the ultra-low DNA input kit, watch the on-demand webinar.

To get started using the ultra-low DNA input protocol for your next project, review the protocol, connect with a scientist at PacBio to discuss your research, or visit our Product and Services Page to purchase your consumables.

Read More »

Wednesday, September 9, 2020

High Quality HiFi Assemblers Open Up a Wide New World of Genomics Possibilities

With PacBio HiFi sequencing data now readily available for organisms of any size, many exciting results have been published featuring new de novo assembly methods optimized for highly accurate long reads. These methods have produced assemblies for a variety of organisms at quality levels never before thought possible — as measured by completeness, contiguity and correctness. We feel privileged to collaborate with the scientific community on the development of these tools.

From Small to Tall

When the USDA wanted to rapidly assemble the Asian Giant Hornet as part of its real-time invasive species response initiative, they turned to a tool developed by our research and development team. Improved Phased Assembly (IPA), developed at PacBio by Ivan Sović (@IvanSovic) and Zev Kronenberg (@zevkronenberg), is an assembler that delivers highly accurate, contiguous, and phased assemblies at very high speeds.

Another new assembler called hifiasm proved useful when a PacBio team wanted to sequence the genome of the tallest living organism on earth: a California Coastal Redwood. PacBio’s own Greg Concepcion (@phototrophic) and colleagues were able to assemble a 48.5Gb redwood genome in just 6-days with 33-fold HiFi read coverage using the method, which was developed out of Heng Li’s (@lh3lh3) lab at Harvard. As described in this pre-print, co-authored by Concepcion, the method uses HiFi reads to produce haplotype-resolved de novo assembly with phased assembly graphs.

Tackling the Tough Stuff

Visual representation of the most continuous HiFi-based assemblies of the CHM13 genome. Nurk, S et al. (2020)

Not to be outdone, a new assembler from Sergey Koren (@sergekoren) and Adam Phillippy’s (@aphillippy) team at the National Human Genome Research Institute (NHGRI), HiCanu, demonstrates how it is now possible to sequence through even the most challenging regions of a human genome. PacBio’s own Rob Grothe is a co-author on the recently published work, which was also featured in a previous blog post.

 

To hear more about the NHGRI’s work on Human Pangenome and Telomere-to-Telomere assemblies, check out the upcoming workshop on September 21-23, 2020: “T2T / HPRC Towards a Complete Reference of Human Genome Diversity.

 

 

Expanding What’s Possible

We are pleased by the possibilities of combining HiFi reads with these new rapid and high-quality genome assembly tools. Not only are they allowing us to tackle new organisms and reach new regions, they are enabling scientists to change the reference genome paradigm to create stand-alone de novo assemblies and pangenome collections with unparalleled speed and ease.

 

To learn more, join us on September 16 for our webinar: Beyond a Single Reference Genome – The Advantages of Sequencing Multiple Individuals”. Or contact a PacBio Scientist to find out how HiFi reads can benefit your next research project.

 

Read More »

Subscribe for blog updates:

Archives