East Coast UGM: SMRT Sequencing Data for Carnivorous Plants, Hummingbirds, Mammalian Methylation, Repeat Expansions, and More
Wednesday, June 22, 2016
Many thanks to the nearly 200 scientists who signed up for our East Coast User Group Meeting, and to the Institute for Genome Sciences for hosting us! The event was a hit with customers and PacBio staff alike, and discussions we heard in the hallways and during breaks told us there was some great knowledge exchange that should give labs inspiration for generating and analyzing their SMRT Sequencing data.
The day kicked off with a talk from our own Marty Badgett, senior product manager for the PacBio RS II System and Sequel System. He offered some historical perspective on throughput improvements over time for the SMRT Sequencing platforms (believe it or not, throughput has increased 100x in the past five years) and shared some data produced by the Sequel System. He spoke about the newly released SMRT Link software, which brings existing modules into an integrated workflow and adds components for data management. Badgett also alerted users to four upcoming enhancements: spin column size selection, on-chip additive cleanup, asymmetric SMRTbell templates, and active loading. With these and other changes, Badgett predicted that a 3 Gb genome could be assembled de novo with the Sequel System for $9,000 by next year and as little as $1,500 in 2018.
The event focused on our customers and the impressive results they’ve achieved with SMRT Sequencing. Jerry Jenkins from HudsonAlpha talked about high-quality de novo plant genomes — his team has released six FALCON-assembled plant genomes already, with more coming soon. The simple sequences and repetitive elements common to plant genomes are a major challenge, he said, noting that “it gives Illumina a fit and with PacBio it works well.” He said SMRT Sequencing has been “a game-changer” for his team when it comes to contiguity and completeness of assemblies. He gave several examples, including the bean Phaseolus vulgaris, for which a Sanger/454 assembly had a contig N50 of 39.5 kb while PacBio yielded 1.9 Mb. An analysis of cotton showed that a short-read assembly was missing nearly 500 Mb of sequence, which was recovered in the PacBio assembly.
Another plant presentation came from Victor Albert of the University at Buffalo, who intrigued attendees with his description of the carnivorous aquatic plant Utricularia gibba. Its genome was originally published in 2013 sporting 82 Mb in a fragmented assembly, but a new PacBio assembly shows the genome size is actually about 100 Mb and captures the data in fewer than 600 contigs. One challenge for this plant is its 3 Mb of rDNA repeats. “You can never assemble rDNA repeats,” Albert said of other sequencing platforms. “PacBio zips right through them.” He was delighted to find that the largest contig contained a full chromosome, and noted that the accuracy was excellent. “There’s no sense in doing any polishing with Illumina,” he added. With the new assembly, he’s able to explore telomeres, centromeres, and retrotransposons; he can also separate tandem duplication events from artifacts of previous whole genome duplications.
Rachael Workman from Johns Hopkins presented transcriptome data generated through the Most Interesting Genome SMRT Grant program for the ruby-throated hummingbird, Archilochus colubris. The team used the Iso-Seq method to analyze liver tissue to better understand this bird’s remarkable metabolism, finding 450,000 unique isoforms. While the full analysis is still underway, Workman shared some interesting examples, such as the complete coverage of a glucose transporter gene that shows quite a bit of divergence across avian species.
Continuing the avian theme, Duke’s Erich Jarvis spoke about the B10K project, an effort to sequence every species of bird on the planet. Jarvis’s area of interest is vocal learning, a trait shared by few organisms. His lab has found that PacBio sequencing produces genome assemblies of much higher quality than those from short-read data (in some cases increasing the contig N50 from as little as 30 kb to as much as 10 Mb, and reducing contig counts from more than 120,000 to about 1,000). In one example, he showed that a dopamine receptor that appeared to be lost by many bird species was actually conserved across them; it just wasn’t included in the short-read draft assemblies. Another example was of Egr1, with a promoter region that scientists had long wanted to study but wasn’t assembled in any existing bird genome. With PacBio sequencing, the region is fully assembled for the first time.
Tao Wu from Yale presented findings of A-base methylation in mouse embryonic stem cells (if you missed the seminal Nature paper on this previously unsuspected form of methylation in a mammal, check out our recap). Wu and team developed SMRT ChIP, a method for conducting ChIP-seq on a PacBio sequencer, and discovered N6-methyladenine throughout the genome. “Without PacBio, I think we could not get this done,” he said. The findings were confirmed by mass spec. Wu also tracked down the Alkbh1 demethylase and conducted functional studies that showed the methylation suppressed genes and transposons.
After lunch, the topic shifted to studies of primate and human genomes. Julie Karl from the University of Wisconsin analyzed the MHC locus in macaques, which have a more complex allele assortment than humans do. Using PacBio amplicon sequencing to span the full 1.1 kb locus, she was able to resolve full-length alleles unambiguously and phase haplotypes, finding many novel alleles to contribute to community databases. Karl is now working to expand her immunology investigation to KIR and FCGR, two other complex regions.
NIAID’s Brandon DeKosky spoke about sequencing antibody repertoires, which requires analyzing both heavy chain and light chain elements. Previous protocols involved analyzing three separate amplicons to cover these regions, but with the PacBio system his team has been able to sequence full-length amplicons covering both elements in B cells. “I certainly am in favor of moving everything over to the PacBio,” he said. “We can see this entire molecule at once.” DeKosky and his colleagues have studied cells from humans and rhesus macaque, and are now using transgenic mice to model antibody response to knocked-in precursor genes to determine whether this approach could be used towards developing a vaccine.
In a clinically oriented research presentation, Tetsuo Ashizawa from the Houston Methodist Research Institute reported the analysis of the ATXN10 gene responsible for spinocerebellar ataxia type 10. The gene harbors a pentanucleotide repeat in intron 9, characterized by an ATTCT repeat sometimes interrupted by other five-nucleotide repeats that lead to three broad disease phenotypes. Since the full region can’t be amplified, Ashizawa used CRISPR/Cas9 for target enrichment, making a double-strand break near the region and attaching a SMRTbell adapter for PacBio sequencing . This allowed the team to assess the motifs associated with different subtypes, in one example studying brothers with different phenotypes of the disorder and finding that the difference lay in repeat content.
Maria Nattestad from Cold Spring Harbor spoke about her genome and transcriptome analysis of SK-BR-3, one of the most widely studied breast cancer cell lines with a Her2 amplification. Long-read SMRT Sequencing was able to characterize the whole genome. Nattestad’s Assemblytics tool, which she used to call variants ranging from single-base changes to large structural changes, is now available as a web app. She discussed her reconstruction of how variants worked together to amplify the Her2 oncogene (an effort we profiled in-depth). Using the Iso-Seq method to produce full-length transcripts, she was also able to detect and explain complex gene fusions for which there was previously no DNA evidence.
Our CSO Jonas Korlach wrapped up the day with a talk highlighting some of the biggest advances we’ve seen from SMRT Sequencing, noting that there are now nearly 1,500 papers describing the use and value of this technology. Particularly productive topics include hospital-associated infections, base modifications in bacteria, and phylogenetic profiling of microbial communities. He also spoke about the recent wave of de novo human assemblies from PacBio data, noting that a shift toward diploid assemblies will be important as the community deepens its understanding of human genetic variation. Korlach also cautioned attendees about confusing contigs with scaffolds; while some assemblies boast impressive scaffold N50s, they may be missing a lot of important information if contigs are short and disconnected.
We’d like to thank our partners, who helped us put on this great event: Advanced Analytical Technologies, Covaris, Diagenode, DNAnexus, PerkinElmer, and Sage Science. With their support, we were able to host the full-day user group meeting, but also half-day workshops on sample prep and bioinformatics as well.