At PAG 2020, HiFi Data ‘Transformational’ for Advancing Plant and Animal Research
Thursday, January 30, 2020
What better way to start the year than a gathering of thousands of stellar scientists? We were excited, once again, to attend the Plant and Animal Genome (PAG) Conference in sunny San Diego and to showcase some of the achievements of our customers at our well-attended workshop.
For those who missed it – or just want to relive the excitement – here is an overview, and recordings of the presentations.
The workshop kicked off with our CSO Jonas Korlach looking back at the evolution of SMRT Sequencing over the last decade, and concluded with an update on the latest PacBio developments, including reduced analysis time with HiFi Reads and an ultra-low DNA input protocol, by Michelle Vierra (@the_mvierra), strategic marketing manager for plant and animal sciences.
Watch Korlach’s introductory remarks:
Watch Vierra’s full workshop talk, PacBio Update on Products and HiFi Applications
Expanding the Tree of Life
First up at the PAGXXVIII PacBio workshop was Mark Blaxter (@blaxterlab), project lead for the Sanger Institute’s Darwin Tree of Life – a position he described as his ‘dream job’. The project, which aims to sequence all 60,000 species believed to be on the British Isles, over the next 12 years, starting with species representing 4,000 families.
“After that, we’ll move on to the genera and after that we’ll do the rest,” Blaxter said. “This requires us ramping up to do 5,000 genomes a year. If you divide that by the number of working days in a year, that’s 20 genomes a day. That’s five before coffee, another five before… It’s terrifying. But I think actually we can get there.”
The Sanger team has already generated data for 94 species, including 44 new moth and butterfly (Lepidoptera) PacBio assemblies. Combined with HiC data, they have been able to generate chromosomal, telomere-to-telomere assemblies from the HiFi reads.
“Having spent years sequencing other butterflies, this is truly transformational,” Blaxter said. “We hope we can spread this across the whole of the Tree of Life.”
Watch Blaxter’s full workshop talk: Endless Forms – Genomes from the Darwin Tree of Life Project
The Fungus Among… Plants
In a talk that might just inspire you to take on mycology, Jana U’Ren (@you_wren) of the University of Arizona discussed the fungi that live inside of plants and her studies of their biology and evolution.
U’Ren’s studies focus on symbiotic fungi found in the photosynthetic tissue of plant leaves. A single leaf can harbor dozens to hundreds of species of fungi. They live asymptomatically within their host species, and are grouped together functionally as endophytes.
Prior sequencing efforts of these endophytes were limited to ~300 base pair fragments of hyper variable regions. While that was useful for community analysis to understand where the species overlapped, it couldn’t be used for phylogenetic analyses.
“What we’re trying to do now is to answer both Where and Who these (fungi) are using PacBio sequencing.”
U’Ren sequenced ribosomal DNA amplicons from 25 different species of plants from Boreal regions at the Arizona Genomics Institute, resulting in more than a million high-quality reads and a treasure trove of data to sift through, which she is currently doing.
“It was a beautiful dataset. We have very high-quality data that has been validated with all the culturing that we’ve been doing over the last 12 years,” U’Ren said. “We recovered a high richness of fungal OTU (operational taxonomic unit), which was what we were looking for, and what we found was this higher phylogenetic diversity.”
Watch U’Ren’s full workshop talk: Phylogenetic Insights into the Endophyte Symbiosis using PacBio Ribosomal DNA Sequencing
A Rose is a Rose
The genome of the rose is almost as complicated as its connotations when given as a gift on Valentine’s Day or other special occasions.
Many roses are “segmental allotetraploids,” which means that part of the genome is behaving like an allotetraploid (with four chromosome sets from two distinct species, which occurs during hybridization) – and part of the genome is behaving like an autotetraploid (with four sets of homologous chromosomes).
Needless to say, parsing all of this out is challenging. Bart Nijland (@bart3601) of Genetwister Technologies explained how his team set out to make a haplotype-aware assembly of Rosa x hybrida L. in order to capture its full range of genetic variation, rather than rely on more traditional assemblies which collapse the haplotypes into single sequences that could be missing critical information.
“A lot of the existing technologies don’t perform very well in doing this. So we were very happy when PacBio released its HiFi protocol. Due to the high accuracy of the reads, we thought this could really help us in solving this challenge,” Nijland said.
A k-mer analysis of their sequenced samples revealed four distinct peaks, exactly what they were expecting in their heterozygous, tetraploid samples. Further de novo assembly of diploid and tetraploid varieties by Nijland and colleagues, including Henri van de Geest (@geesthc) and Mark de Heer, provided an even better picture of the variation between them.
“This provides a very valuable tool for molecular breeding efforts in rose,” Nijland said.
Watch Nijland’s full workshop talk: The Impact of Highly Accurate PacBio Sequence Data on the Assembly of a Tetraploid Rose
Going Ape Over Iso-Seq Analysis
The work of Zev Kronenberg (@zevkronenberg) and team made headlines — and the cover of Science — when reported a high-resolution comparative analysis of great ape genomes. During the workshop, he shared how transcriptome analysis via the Iso-Seq method led to further discoveries.
Kronenberg, then a post-doc in the lab of Evan Eichler and now a senior bioinformatics engineer at PacBio, used PacBio’s RNA sequencing method, Iso-Seq, to annotate the great ape genomes his team created, detangle several complicated loci, and enrich our biological understanding of the differences between us and our closest relatives.
“We spent a lot of effort not only ensuring that the genome assembly turned out well, but we made sure that the de novo genome annotation was done correctly and that we were able to trust the genes that we find.”
Mapping the transcriptome data of the great apes against human transcriptome data, Kronenberg and his colleagues looked for areas where they differed. He showed several examples, including a human specific 60 Kb intronic deletion that, with a bit of digging, a graduate student was able to associate with a region linked in other studies to human diet.
“Without the Iso-Seq data, that probably would have been the end of the story. But with Iso-Seq data, we were able to identify how this non-coding variant could potentially have a phenotypic effect.”
The project proved the power of combining genome and transcriptome data, Kronenberg said.
“No story is really complete with just a genome. The Iso-Seq data was absolutely central to us discovering really interesting biological candidates.”
The higher capacity and speed of the Sequel II System has made even more possible, Kronenberg added.
“We sequenced I think well over 100 SMRT Cells for only the Iso-Seq data. Today, a single SMRT Cell would do that whole project. And that, to me, is mind blowing.”
Watch Kronenberg’s full workshop talk: Characterizing Genetic Differences between Great Apes using Iso-Seq Data
HiFi Data Assemble!
Several other presentations throughout the conference demonstrated how highly accurate HiFi reads on the Sequel II System are improving results, including “HiCanu: Resolving repeats and haplotypes” by Sergey Koren (@sergekoren) of NHGRI, slides of which are available here. In addition to HiCanu, two other genome assemblers built for HiFi data also made their debut: Nighthawk from Zev Kronenberg and Hifiasm from Heng Li (@lh3lh3).
Our four poster presentations from the PAG conference are available to view:
- A Complete Solution for High-Quality Genome Annotation using the PacBio Iso-Seq Method – Elizabeth Tseng, et al.
- Beyond Contiguity: Evaluating the Accuracy of De Novo Genome Assemblies – Sarah Kingan, et al.
- Every Species Can be a Model: Reference-quality PacBio Genomes from Single Insects – Sarah Kingan, et al.
- A High-Quality PacBio Insect Genome from 5 ng of Input DNA – Jonas Korlach, et al.