This blog features voices from PacBio — and our partners and colleagues — discussing the latest research, publications, and updates about SMRT Sequencing. Check back regularly or sign up to have our blog posts delivered directly to your inbox.
Search PacBio’s Blog
It’s DNA Day, the annual celebration of the discovery of the double helix, the completion of the Human Genome Project, and all things genetic. We like to take the opportunity to look back at DNA-based advances from the past year, and progress has been truly stunning. Just when we think it couldn’t get more awe-inspiring, scientists generate new results that prove us wrong.
One of the most impressive feats in the past year has been the proliferation of population-specific, reference-grade human genomes. From the Chinese genome assembly that recovered nearly 13 Mb of sequence missed in GRCh38 and produced new insights around alternative splicing to the diploid Korean genome assembly that detected nearly 12,000 novel structural variants — including several specific to Asian populations — these new resources are showing us how much sequencing must be done to represent the universe of natural human genetic variation. Several other country or population genome projects have reported results or are in the works, and we’re eager to see how this data fills in the blanks to help us better understand the human genome. Structural variation in particular is being detected more comprehensively than ever, with even small amounts of long-read sequencing helping scientists to connect these elements to their likely function.
We’ve also seen compelling work from the plant and animal research community. Just in the past year, scientists have published new high-quality genome assemblies for quinoa and goat, shattering contiguity records even for challenging genomes. In maize, researchers reported new studies that produced accurate gene copy number counts and a more complex transcriptome than anticipated. Alternative splicing was also the focus of a sorghum study. And we were delighted to learn that the Genome 10K (G10K) and Bird 10,000 Genomes (B10K) initiatives announced plans to ramp up their efforts to generate high-quality de novo genome assemblies.
On the microbial front, we were especially fascinated by a new report detailing the epigenetic changes that occur as free-living bacteria morph into symbiotic bacteria associated with a host. There was also a project that investigated how drug-resistance plasmids are swapped across bacterial species by analyzing the entire “mobilome” of carbapenemase-producing Enterobacteriaceae. And since we’re suckers for extremophile research, we couldn’t resist this genome profile of a single-celled diatom living in the Antarctic Ocean.
All of these projects were accomplished with SMRT Sequencing. On DNA Day, we’d like to congratulate the entire research community working to improve our understanding of genomics.
Today is Earth Day, a great time to reflect on the growing trend of conversation genomics. We are proud that many scientists are using PacBio long-read sequencing for the goal of rescuing endangered species and preserving delicate ecosystems around the world.
One of the first examples we saw of this approach came from Oliver Ryder at the San Diego Zoo Institute for Conservation Research. Ryder and his team performed SMRT Sequencing for the ‘alalā, a Hawaiian crow, which no longer existed in the wild. In this video, he describes how having a high-quality genome assembly for this bird will have a significant impact on biologists’ ability to breed and reintroduce healthy crows back to their native environment. Ryder is also a founder of the Genome 10K (G10K) project, which aims to create high-quality assemblies for 10,000 vertebrate species as part of a large-scale conservation effort.
We’ve also been impressed by public support for a crowdfunded conservation genomics project — this one for the kākāpō bird, a critically endangered species found only in New Zealand. David Iorns, founder of the Genetic Rescue Foundation, is using SMRT Sequencing to build a reference-grade de novo genome assembly for the bird, followed by resequencing all 125 remaining kākāpōs. These members of the parrot family are facing fertility issues, a major population bottleneck, and other challenges that make a conservation effort necessary to prevent them from going extinct.
Recently, conservation expert Rebecca Johnson from the Australian Museum Research Institute gave a talk on the de novo genome assembly of a koala. This lovable marsupial species has been on the radar of conservation biologists who want to protect it in part because it has a number of unique and interesting features. Johnson used SMRT Sequencing to analyze the 3.6 Gb genome, yielding what she calls the best marsupial assembly to date.
This year, Earth Day is also marked by the first-ever March for Science, including more than 500 marches across the globe to support better research funding and pro-science policies. We’ll be cheering on all the scientists involved in conservation genomics and other important efforts to protect our planet and all the creatures that call it home!
Researchers from the Okinawa Institute of Advanced Sciences published a compelling review article describing several recent clinically relevant projects they have completed using SMRT Sequencing. Released in the journal Human Cell, “Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area” comes from lead author Kazuma Nakano, senior author Takashi Hirano, and collaborators.
The team adopted long-read PacBio sequencing as an alternative to short-read sequencers that missed too many important genomic elements. “PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization,” they write. “These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions.”
The scientists present several examples of how this technology has made a difference in their work. Many of these studies were previously unpublished. While we can’t cover them all, here are a few vignettes that caught our attention:
- The team fully sequenced the genome of the Kurono strain of Mycobacterium tuberculosis, yielding a single, circular contig. GC content was as high as 80% across the genome, which also featured “117 sets of >1000 bp identical sequence pairs.”
- They sequenced the genome of a multidrug-resistant isolate of Acinetobacter baumanniicollected in a Nepalese hospital. The assembly, represented in two circular contigs for a chromosome and its plasmid, included several genes conferring drug resistance.
- SMRT Sequencing allowed the scientists to perform de novo assembly and methylation detection for several variants of Leptospira interrogans in a study designed to identify mechanisms underlying virulence in the zoonotic disease leptospirosis.
- A flu study relied on SMRT Sequencing for whole genome analysis of 48 influenza viruses isolated in Okinawa. The study included at least one sample from the H1N1 pandemic in 2009. “Our genomic data set contained temporal and spatial information about the seasonal and pandemic prevalence of flu in Okinawa,” the authors report. “Such insight gleaned will help elucidate the mechanism of acquired resistance to vaccines and drugs and thus inform future drug and vaccine development.”
- The team used long-read sequencing to explore why the incidence of gastric cancer is lower in Okinawa than anywhere else in Japan, despite consistent prevalence of Helicobacter pyloriacross the country. By conducting whole genome sequencing and methylation detection for eight pylori strains, the scientists spotted virulence factor-dependent motifs.
This review demonstrates several of the clinically relevant applications for SMRT Sequencing. PacBio “has significantly impacted basic science and biology and is reaching its influence into the clinical/medical atmosphere,” the scientists write. The technology is “ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization.”
A preprint from scientists at the University of Florida, Centro de Investigaciones Principe Felipe, and other institutes describes a new analysis tool to help boost quality of transcriptome studies. “SQANTI: extensive characterization of long read transcript sequences for quality control in full-length transcriptome identification and quantification” comes from lead author Manuel Tardaguila, senior author Ana Conesa, and collaborators.
The automated pipeline for Structural and Quality Annotation of Novel Transcript Isoforms (SQANTI) was developed as a quality-assessment tool for transcripts discovered with SMRT Sequencing. SQANTI “calculates up to 35 different descriptors of transcript quality and creates a wide range of summary graphs to aid in the interpretation of the sequencing output,” the authors report.
Development of this new pipeline was spurred by the realization that different transcript analysis tools yielded different results, even for the same data set. “As an example, sequencing the mouse neural transcriptome with PacBio long reads, we obtained ~ 80,000, 12,000 and 16,000 different transcripts when applying Tapis, IDP or the ToFU pipeline, respectively,” the scientists write. “Implementing a comprehensive, quality aware analysis of PacBio reads is fundamental at a time when long read transcriptome sequencing is becoming more popular and important conclusions on transcriptome diversity will be drawn from these data.”
SQANTI consists of tools to classify transcripts by comparison to a reference annotation, analyze data by more than 30 metrics, and generate graphs to report results. The team tested it using neural tissue from mice, performing extensive RT-PCR validation to measure transcript expression. PacBio sequencing of the tissue identified many novel transcripts, but “an important fraction of the novel sequences are presumably bioinformatics or retrotranscription artifacts that can be removed by using SQANTI descriptors,” the scientists report.
They also evaluated results against data from short-read sequencing. “A comparison of Iso-Seq over the classical RNA-seq approaches solely based on short-reads demonstrates that the PacBio transcriptome not only succeeds in capturing the most robustly expressed fraction of transcripts, but also avoids quantification errors caused by unaccounted 3’ end variability in the reference,” Tardaguila et al. write. “SQANTI allows the user to maximize the analytical outcome of long read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.”
A new publication in Genome Research shows how the use of SMRT Sequencing, in combination with other technologies, can reveal far more about repetitive DNA and structural variants than short-read sequencing alone. In this paper, scientists compared genome assemblies produced with short reads, long reads, and optical maps to understand the performance of each approach.
From Uppsala University, the University of Munich, and Bionano Genomics, the team studied the Eurasian crow for this project. The resulting paper, “Combination of short-read, long-read and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications,” comes from lead author Matthias Weissensteiner, senior author Jochen Wolf, and collaborators. They used an existing short-read assembly and generated a de novo PacBio long-read assembly and an optical map with Bionano Genomics, all from the same individual.
The PacBio-only assembly alone delivered a major improvement over the short-read assembly. Contiguity increased by almost 90-fold, with the long-read assembly featuring a contig N50 longer than 8.5 Mb. The SMRT Sequencing assembly also resolved more than 70 Mb of sequence missed in the short-read assembly, including nearly 16 Mb of repetitive elements.
The various assemblies were then compared and joined to determine how each source of information contributed to a final, high-quality genome resource. This step allowed the team to spot mis-assemblies, which occurred more frequently in the short-read assembly. They detected 43 mis-joins in the short-read assembly, and fewer than half that number in the long-read assembly.
One of the motivating factors for this project was an interest in understanding the repetitive DNA associated with constitutive heterochromatin, which has an influence on recombination. To that end, the team analyzed large tandem repeat arrays in the crow genome and used population resequencing data to estimate effects on recombination rate. “We characterized 36 previously unidentified large repetitive regions in the proximity of sequence assembly breakpoints, the majority of which contained complex arrays of a 14-kb satellite repeat or its 1.2-kb subunit,” the scientists report. They determined that the recombination rate was “significantly reduced in these regions.”
“Our results demonstrate the potential of combining independent technologies to discover previously inaccessible genomic features,” Weissensteiner et al. write. “With an emerging picture of genome architecture affecting the distribution of genetic diversity across genomes, the integration of large tandem repeat arrays into genome assemblies constitutes an important improvement.”
We’re delighted to see the release of another high-quality avian genome, which will support ongoing efforts in the B10K and G10K projects to represent as many species as possible.
We are excited to announce our 2017 series of SMRT Community Events and User Group Meetings (UGMs), where you can learn first-hand how members of the scientific community are leveraging the latest capabilities of SMRT Sequencing for a growing number of applications.
Our vibrant community of users are enthusiastic about sharing tips, exchanging ideas, and developing new applications. These upcoming events will facilitate just that — and we hope you can join us!
We are now accepting registrations for our SMRT Leiden, Americas East Coast UGM and Asia Pacific UGM. Please save the date for our Americas West Coast UGM, SMRT Developers Meeting, and EMEA UGM, taking place later in the year.
- May 2 – 4, SMRT Leiden: SMRT Scientific Symposium & Informatics Developers Meeting, Leiden, The Netherlands
- May 31 – June 1, APAC User Group Meeting, Seoul, South Korea
- June 27 – 28, Americas East Coast User Group Meeting & Workshops, Baltimore, MD
- September 6 – 7, Americas West Coast User Group Meeting & Workshops, Palo Alto, CA
- Fall 2017, SMRT Informatics Developers Meeting, TBD, MD
- November 2 – 3, EMEA User Group Meeting, Barcelona, Spain
Call for Speakers
Our scientific advisory committee is currently reviewing speakers for the East Coast UGM & Workshops. If you are interested in sharing your latest research, please submit a proposal when you register. The deadline for consideration is Wednesday, May 17.
We look forward to seeing you at our upcoming SMRT Community Events & User Group Meetings!
A new preprint offers an enticing look at transcriptome results from analysis of a hummingbird using SMRT Sequencing. In this study, scientists found new clues to explain unique attributes of the bird’s metabolism. The work was made possible through full-length isoform sequencing, which allowed deep, assembly-free analysis even though no reference genome was available.
“Single molecule, full-length transcript sequencing provides insight into the extreme metabolism of ruby-throated hummingbird Archilochus colubris” is now available on BioRxiv. From Rachael Workman, Alexander Myrka, Elizabeth Tseng, William Wong, Kenneth Welch, and Winston Timp, the paper describes a project designed to better understand how hummingbirds switch metabolic gears to focus on sugars or lipids as needed. “This metabolic flexibility is remarkable both in that the birds can switch between exclusive use of each fuel type within minutes,” they write, “and in that de novo lipogenesis from dietary sugar precursors is the principle way in which fat stores are built, sometimes at exceptionally high rates, such as during the few days prior to a migratory flight.”
The team used the Iso-Seq method with long-read PacBio data to generate full-length isoform sequences, focusing on the liver of Archilochus colubris. According to the paper, this represents “the first high-coverage transcriptome of any single avian tissue.” They also aligned transcripts to Calypte anna, a recently completed hummingbird assembly that also made use of SMRT Sequencing.
Workman et al. report that the use of long-read PacBio data allowed for more accurate views of isoforms and alternative splicing, even without a reference genome. “Using full-length transcript data, we found alignment unnecessary to generate clear pictures of the gene isoforms,” they note. “The long reads negate the need for transcript assembly, a precarious analysis in the absence of a genome.” Nearly half of the reads in the final analysis covered full-length genes, including the 5’ and 3’ ends as well as the polyA tail.
The team used the COGENT pipeline to assign transcripts to gene families and focus on unique isoforms. “COGENT is specifically designed for transcriptome assembly in the absence of a reference genome, allowing for isoforms of the same gene to be distinctly identified from different gene families,” the scientists write. Their analysis generated a highly diverse set of isoforms, which the authors believe “represents a nearly complete transcriptome of the hummingbird liver.”
With that dataset, the scientists found genes unique to hummingbird. “These genes showed a specific enrichment for pathways involved in lipid metabolism — suggesting that the hummingbird has evolved variants of these genes to achieve its high levels of metabolic efficiency,” they report.
The scientists note that follow-up functional assays will be an important next step in understanding and verifying the function of many genes of interest.
We’re excited to be heading to Washington, DC, for the annual meeting of the American Association for Cancer Research. The PacBio team always enjoys hearing about the latest in cancer translational research at AACR, along with thousands of leading scientists in the field.
Many of those scientists have already learned that SMRT Sequencing provides a unique view into cancer, revealing structural variation, phasing distant variants, and delivering full-length isoform sequences. With uniform coverage, industry-leading consensus accuracy, and reads extending to tens of kilobases, PacBio long-read sequencing gives researchers the ability to monitor and make sense of even the most complex changes in tumor DNA.
If you’ll be attending AACR, stop by booth #1617 to get a first look at the new Integrative Genomics Viewer (IGV v3) featuring greatly improved support for SMRT Sequencing data. We’ll be demonstrating the new features in IGV v3 with a PacBio whole genome sequencing dataset (the SK-BR-3 Human Breast Cancer cell line). Visit us to see how PacBio data visualized in IGV v3 reveals the hidden landscape of somatic structural variants in a cancer genome including translocations, gene fusions, and novel mobile element insertion sites.
In addition, check out these posters from our scientists to see SMRT Sequencing data in action for cancer studies:
SMRT Sequencing of Full-length Androgen Receptor Isoforms in Prostate Cancer Reveals Previously Hidden Drug Resistant Variants
Tyson Clark, Ph.D., PacBio
Sunday, April 2, 1 p.m. – 5 p.m., Abstract #425/25, Section 17
Simplified Sequencing of Full-length Isoforms in Cancer on the PacBio Sequel System
Meredith Ashby, Ph.D., PacBio
Monday, April 3, 1 p.m. – 5 p.m., Abstract #2442/29, Section 17
Detection of Low-frequency Somatic Variants using Single-molecule, Real-time Sequencing
Primo Baybayan, PacBio
Wednesday, April 5, 8 a.m. – 12 p.m., Abstract #5366/22, Section 15
Finally, we’ll be launching a new SMRT Grant program at AACR. Just tell us how full-length isoform sequencing of your cancer samples will drive new discoveries in your research for a chance to win library construction, PacBio sequencing, and bioinformatics analysis for your project. Check out the rules and submit your 250-word proposal by May 15. Many thanks to our partner GENEWIZ for helping us make this grant program possible!
The recent beta release of version 3 of the popular genome browser IGV greatly improves support for PacBio data . The long reads (up to 50 kb) and random error profile of PacBio SMRT® sequencing facilitate new applications in genome assembly, structural variant discovery, and haplotype phasing. These unique properties and applications benefit from customized data visualization.
IGV 3 extends support for PacBio long reads with: performance improvements to enable viewing variants at multi-kilobase scales; a “quick consensus” mode that suppresses single read random errors; labels for large insertion and deletion structural variants; and “group by base” to explore haplotype phase. The new capabilities are featured in a 4-minute tutorial video. To try them yourself, install IGV 3, and then load this IGV session (File > Open Session) with a sample dataset of 70-fold sequencing of a human genome, HG002 from NIST Genome in a Bottle .
It is visually challenging to identify biological variation (single nucleotide variants occur about every 1,000 basepairs in humans) among the more frequent sequencing errors in PacBio reads. However, because PacBio errors are random, quality is extremely high in a consensus of independent reads, often surpassing the quality from next-generation sequencing . A mismatch that is consistent across reads indicates biological variation .
IGV has added two features, “quick consensus mode” and “hide indels”, to reveal biological variation in PacBio reads. The quick consensus mode shows mismatches only at positions where more than a specified fraction of reads disagrees with the reference (recommended setting: 25%). The logic is the same as used by the coverage track. The “hide indels” feature (recommended setting: <10 bp) suppresses the most common error in raw PacBio reads, random small indels. Both features are available in the “Alignment” tab of the IGV preferences (View menu > Preferences).
Figure 1. Quick consensus mode and indel hiding reveal biological variation in PacBio long reads. (a) Both quick consensus and indel hiding are available in the “Alignment” tab of the IGV preferences (View menu > Preferences). Recommended settings are to hide mismatches at below 25% coverage allele fraction and indels shorter than 10 bases. (b) Raw PacBio reads with no consensus error correction. (c) The same read data with consensus mode and indel hiding activated reveals a number of homozygous and heterozygous single nucleotide variants.
Each human genome has approximately 20,000 structural variants (differences ≥50 basepairs with the reference), most of which require PacBio long reads to detect . For variants contained in a single read alignment, IGV 3 adds an option to “label large indels”, which lists the basepair size of the variant on a colored block whose width is proportional to the size of the indel. The “label large indels” feature is available in the “Alignment” tab of the preferences (View menu > Preferences). The recommended setting is to label indels larger than 10 basepairs.
Figure 2. Label insertion and deletion structural variants. (a) The option to label large indels is available in the “Alignment” tab of the IGV preferences (View menu > Preferences). Recommended settings are to label indels larger than 10 bases. (b) An insertion larger than the defined threshold is indicated by a purple box. The width of the box is proportional to the size of the insertion, and the basepair size is written on the box if it fits. (c) A deletion is indicated by a black line. The basepair size of the deletion is written on a white box at the center of the line. Examples are from HG002 sequenced by Genome in a Bottle.
For reads with very large structural variants or which contain inversions, mappers like BWA produce separate primary and supplementary alignments. IGV 3 adds an option to “link supplementary alignments” to visually connect separate alignments from the same read. Reads that have alignments to both strands, which can indicate an inversion, are colored turquoise. “Link supplementary alignments” is available in the right-click menu for each alignment track.
Figure 3. Link alignments from the same read. (a) The option to link primary and supplementary alignments from a read is available in the right click menu for the alignment track. (b) When “link alignments” is active, separate alignments from the same read are drawn on a single row and connected by a thin line. Reads with alignments to both strands, which can indicate an inversion, are colored turquoise. The example shows reads that support an inversion in HG002.
PacBio long reads can span multiple single nucleotide and structural variants, which directly phases the variants into haplotypes . To support visual exploration of haplotypes, IGV 3 adds an option to “group by base,” which categorizes reads by the basepair at a selected position. “Group by base” is available by right clicking on the basepair position by which to group. IGV 3 also includes performance improvements that enable variation to be shown at zoom levels of 10 kb and larger, which is critical to view haplotype structure.
Figure 4. Explore haplotype phase by grouping alignments by basepair. (a) The option to group alignments by the basepair at a selected position is available in the right click menu for the alignment track. (b) Ungrouped alignments from a locus in HG002 with a heterozygous deletion and several heterozygous single nucleotide variants. (c) Grouping the alignments by a heterozygous single nucleotide variant reveals two clear haplotypes.
To utilize the new capabilities, install IGV 3, and then load this IGV session (File > Open Session) with a sample dataset of 70-fold sequencing of a human genome, HG002. Congratulations to Jim Robinson, Helga Thorvaldsdóttir, and the rest of the IGV team and community for the release of IGV 3!
 Robinson JT, et al. (2011). Nat Biotechnol, 29(1):24-6.
 Zook JM, et al. (2016). Sci Data, 3:160025
 Roberts RJ, et al. (2013). Genome Biol, 14(7):405.
 Chin CS, et al. (2013). Nat Methods, 10(6):563-9.
 Chaisson MJ, et al. (2016). Nature, 517(7536):608-11.
 Chin CS, et al. (2016). Nat Methods, 13(12):1050-4.
A new PLoS One publication cites the use of SMRT Sequencing to clarify the transmission path of infection in a transplant recipient. This work is an excellent example of the clinical utility offered by long-read PacBio sequencing.
The project was spurred by the frustrating inability to distinguish between hospital-acquired infections and donor-to-recipient infections through solid organ transplants. Scientists and clinicians from the Icahn School of Medicine at Mount Sinai and the University of Texas Medical School teamed up to apply advanced sequencing technologies in the case of a liver transplant recipient infected with vancomycin-resistant Enterococcus. In their report, lead author Ali Bashir, senior author Shirish Huprikar, and collaborators describe the use of whole genome sequencing to pinpoint the likely means of infection.
The scientists note that cultures taken during the donor’s hospitalization prior to death were negative for Enterococcus until days after the transplant occurred. They analyzed bacterial samples from the donor, the recipient, and hospital isolates collected at the same time using SMRT Sequencing technology and other methods. “Automated de novo construction of high-quality bacterial genomes using long-read whole genome sequencing (WGS) is a powerful tool that can aid in donor transmission epidemiology,” they write.
The resulting Enterococcus genome assemblies “were highly contiguous; in all cases, the assemblies contained fewer than 10 contigs with the largest contig representing more than 50% of the total genome length,” the scientists report. They produced a phylogenetic tree for the samples and found that the bacterial genomes collected from donor and recipient were most closely related. However, other types of analysis — such as multilocus sequence typing and pulse-field gel electrophoresis — generated more ambiguous results. “Only the full de novo assembly was able to clarify the unique structural differences between the donor and recipient isolate,” Bashir et al. report. “Our data suggest that WGS may be increasingly necessary to unambiguously confirm transmission for structurally mutable genomes.”
Because long-read sequencing is uniquely able to resolve large structural elements, the scientists suggest that it will become more commonly used for studies like this one. “We expect that WGS and assembly of pathogen genomes will be increasingly important not only for understanding pathogen biology and evolution, but also become a routine and essential tool for investigation of potential organ transplant transmissions in many settings,” they conclude.
Blog readers may recall that last year’s SMRT Grant winner was Renying Zhuo from the Chinese Academy of Forestry. We’re pleased to report that the project is now complete!
Zhuo proposed sequencing the genomes of two strains of the Sedum alfredii plant from the same ecosystem — one that accumulates cadmium ions from polluted soil and one that doesn’t. The goal was to use high-quality assemblies for comparative genomic analysis to determine the genetic mechanisms responsible for this remediation effect.
Plant DNA was sequenced on the Sequel System by RTL Genomics, and genome assembly was performed by Computomics. (We’re also grateful to Sage Science and Experiment, the other co-sponsors of the SMRT Grant program in making this a worldwide democratic event.) Both plant genomes made it into the “1 Mb contig N50 club” (#1MbCtgClub on Twitter), with contig N50s of 1.08 Mb for Sedum alfredii HE and 1.26 Mb for Sedum alfredii NHZ.
Zhuo and his team will now dive into a deep, detailed comparative analysis between the two genomes to identify genes associated with metal accumulation. Ultimately, they hope the results can be used to improve bioremediation efforts for soils contaminated with heavy metals.
Detailed stats for the two plant assemblies from Computomics:
|Sedum alfredii HE||Sedum alfredii NHZ|
|Contig size [bp]||235739357||397076979|
|Longest Contigs [bp]||3521758||5719050|
|Contigs > 1 M [#]||74||117|
|N50 contig length [bp]||1087129||1256010|
|L50 contig count [#]||65||91|
|BUSCO complete [%]||88.5||90.1|
|BUSCO complete single copy [%]||60.9||21.8|
|BUSCO complete duplicated [%]||27.6||68.4|
|BUSCO fragmented [%]||2.4||1.7|
|BUSCO missing [%]||9.1||8.1|
Voting is now open for this year’s Plant and Animal SMRT Grant program. Check out the five finalists and cast your vote by April 5!
Efforts to produce a reference-grade goat genome assembly for improved breeding programs have paid off. A new Nature Genetics publication reports a high-quality, highly contiguous assembly that can be used to develop genotyping tools for quick, reliable analysis of traits such as milk and meat quality or adaptation to harsh environments. The program also offers a look at how different scaffolding approaches perform with SMRT Sequencing data.
“Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome” comes from lead authors Derek Bickhart, Benjamin Rosen, and Sergey Koren; senior author Tim Smith; and collaborators. The large team of scientists is affiliated with the USDA Agricultural Research Service, National Human Genome Research Institute, the University of Washington, and many other organizations.
The project was motivated by a clear need to develop methods for high-quality livestock genome assemblies to benefit breeding communities. Goat offers a particular boost to developing countries, where these animals are a primary source of textile fiber, milk, and meat. “A finished, accurate reference genome is essential for advanced genomic selection of productive traits and gene editing in agriculturally relevant plant and animal species,” the scientists report. Previous efforts to sequence the goat genome with short reads resulted in a highly fragmented assembly that could not resolve repetitive and other challenging regions. For this work, the team analyzed the genome of a highly homozygous male San Clemente goat (Capra hircus) using a number of technologies.
They chose SMRT Sequencing because its long reads could characterize even the most difficult genomic regions. “Initial assembly of the PacBio data alone resulted in a contig NG50 … of 3.8 Mb,” the team reports. PacBio contigs were then connected with optical mapping and Hi-C data to create extremely long scaffolds in the final 2.92 Gb assembly. “These combined technologies produced what is, to our knowledge, the most continuous de novo mammalian assembly to date, with chromosome-length scaffolds and only 649 gaps,” they write. The assembly is 400 times more continuous than the previous short-read assembly.
To learn more about how these technologies complement each other, the scientists analyzed results from optical mapping and Hi-C data separately. They found that Hi-C data yielded a tenth the number of scaffolds that optical mapping did, but it led to more misoriented contigs, which were correlated with restriction site density. “Ultimately, we found that sequential scaffolding with optical mapping data followed by Hi-C data yielded an assembly with the highest continuity and best agreement with the [radiation hybrid] map,” the team reports, noting that this approach is significantly less expensive than generating a short-read draft genome assembly and manually finishing it to high quality.
The final assembly includes notoriously difficult regions, such as centromeric DNA and the Y chromosome. Two chromosomes appear to be completely assembled, and two others seem to include “the elusive p arm,” Bickhart et al. write.
Of course, since the scientists were focused on building a resource that would help breeding programs, they also assessed its potential impact in that space. “Chromosome-scale continuity of the ARS1 assembly was found to have appreciable positive impact on genetic marker order for the existing C. hircus 52K SNP chip3,” they report.
Going forward, the team hopes to generate a phased diploid assembly for C. hircus.
Our team of scientist reviewers has considered hundreds of submissions for the latest SMRT Grant award and narrowed the selection to five finalists. Now it’s your turn! We welcome the community to vote for their favorite project now through April 5th. The winner will receive SMRT Sequencing and genome assembly or Iso-Seq analysis sponsored by PacBio and our partners, the Arizona Genomics Institute and Computomics.
Here’s a look at the entries from our five finalists:
Project: Temple Pitviper
Principal investigators: Mrinalini Mrinalini, National University of Singapore; Ryan McCleary, Utah State University; Manjunatha Kini, National University of Singapore
The highly venomous snake Tropidolaemus wagleri, common to southeast Asia, has a number of unique features that merit further study. Its venom contains toxic proteins not found in other species of snake, including a group of novel toxins that have not been well characterized. This reptile also has sex-specific phenotypes, which is unusual for snakes; interestingly, these differences are not seen until the snake reaches sexual maturity, but the biological trigger for this is not understood.
Project: Solar-powered Slug
Principal investigators: Carola Greve, Zoological Research Museum A. Koenig; Alexander Donath, Zoological Research Museum A. Koenig
Scientists propose sequencing the genome of Elysia timida, a Mediterranean sea slug that has the rare ability to consume algae and keep the ingested chloroplasts functioning. Inside the slug, these chloroplasts continue photosynthesis, building up a starch reservoir that can feed the slug for three months. The project aims to scour the genome for genes associated with this unique ability, to understand the mollusk’s eco-friendly biology, as well as the process of incorporating organelles.
Project: Pink Pigeon
Principal investigators: Matthew Clark, Earlham Institute; Cock Van Oosterhout, University of East Anglia
This effort would use the Iso-Seq method to generate the transcriptome of the pink pigeon, an endangered species native to Mauritius. The species suffers high levels of infertility and pathogen susceptibility, possibly related to a population bottleneck. Scientists would use SMRT Sequencing data to study the bird’s loss of genetic variation and to find variants associated with fitness and pathogen resistance.
Project: Explosive Beetle
Principal investigators: Tanya Renner, San Diego State University; Aman Gill, University of California, Berkeley; Wendy Moore, University of Arizona; Kipling Will, University of California, Berkeley; Athula Attygalle, Stevens Institute of Technology
The bombardier beetle (Brachinus elongatulus) is known for its ability to “explosively discharge a toxic mix of quinones, oxygen, and water vapor at over 100°C,” this proposal says. Scientists would sequence the insect’s 500 Mb genome to understand insect chemical biosynthesis and biodiversity. This would represent the first genome sequence for the beetle suborder Adephaga.
Project: Dancing with Dingoes
Principal investigators: Bill Ballard, University of New South Wales; Claire Wade, University of Sydney, Sydney, Australia
This team aims to sequence the 2.5 Gb genome of the Australian dingo and compare it to that of the wild wolf and domestic dog to understand the evolutionary process that led from wild animal to pet. According to the proposal, this project will also “inform aspects of indigenous Australian culture and advance our understanding of the Australian continent’s top-level predator.”
Congratulations to all five finalists for their excellent proposals – may the most interesting genome win! Help support your favorite project now until April 5th.
A new genome assembly has remarkable promise to boost the global food supply. Scientists from King Abdullah University of Science and Technology and other institutions sequenced quinoa, a nutritious grain that can grow in marginal lands and other suboptimal environments. Their assembly offers new clues that could help improve breeding efforts to make the plant more accessible worldwide.
“The genome of Chenopodium quinoa” was published recently in Nature by lead author David Jarvis, senior author Mark Tester, and a large group of collaborators. They focused on this plant, which is believed to have been domesticated more than 7,000 years ago in South America, because it is rapidly becoming accepted as a superfood with potential to address the growing food supply challenge. Quinoa is a relatively low-sugar, gluten-free grain with lots of nutrients. But expanding its use as a crop around the world requires new breeding efforts, the authors report. They used SMRT Sequencing to generate a high-quality, chromosome-scale genome assembly for the allotetraploid plant, a valuable resource that can now be used by breeding programs to produce shorter, higher-yielding plants with increased stress tolerance and other desirable traits.
The team sequenced a plant from coastal Chile, followed by scaffolding with Bionano Genomics and Dovetail Genomics tools. The assembly is 1.39 Gb, represented in fewer than 3,500 scaffolds. Ninety percent of the genome is covered in just 439 scaffolds. “This assembly represents a substantial improvement over the previously published quinoa draft genome sequence, which contained more than 24,000 scaffolds with 25% missing data,” the scientists report. Iso-Seq analysis and other annotation methods resulted in nearly 45,000 gene models, while a BUSCO analysis found that more than 97% of reported genes were included in the assembly. The group also sequenced two diploids from ancestral quinoa relatives.
One of the most exciting findings from the project was the discovery of a transcription factor that is believed to regulate production of saponins, bitter-tasting molecules in the quinoa shell. A premature stop codon found in sweet quinoa strains suggests that it may be possible to breed these saponins out to produce a plant more amenable for farming.
“These resources provide the foundation for accelerating the genetic improvement of the crop, with the objective of enhancing global food security for a growing world population,” Jarvis et al. write.
A recent effort to understand the genetic mechanisms behind swappable elements of drug-resistance among bacteria built on previous studies of Enterobacteriaceae isolates collected at the National Institutes of Health Clinical Center. The work was made possible by high-quality genome assemblies of these organisms generated earlier with SMRT Sequencing technology.
In this project, scientists from the U.S., France, and Brazil teamed up to learn precisely how drug-resistance plasmids are spread from one species to another. They report the results of that investigation in mBio with the publication “Mechanisms of Evolution in High-Consequence Drug Resistance Plasmids” from lead author Susu He, senior author Fred Dyda, and collaborators. The team focused on the full complement of mobile elements (or the “mobilome”) found in carbapenemase-producing Enterobacteriaceae. “The availability of highly accurate plasmid assemblies for these strains based on long-read PacBio SMRT sequencing allows for the unambiguous and precise annotation of mobile elements,” they report.
The scientists analyzed plasmid evolution from isolates collected during an outbreak of carbapenem-resistant Klebsiella pneumoniae at the NIH Clinical Center in 2011 and 2012 as well as from other samples collected at the center over several years. By tracking target site duplications in samples, the team could infer the evolution of drug resistance. “We are able to propose the exact historical molecular events underlying plasmid rearrangements which provide a basis for understanding how antibiotic-resistant strains change over time, with significant implications for combating plasmid-mediated antimicrobial resistance,” they write.
Of course, that raises the question of which evolutionary mechanisms are causing the changes they characterized. The scientists found two mobile element types — IS26 and Tn3 transposons — that appeared to be driving drug resistance evolution in the K. pneumoniae samples studied. However, they note, there was no clear explanation for that discovery. “This analysis revealed that plasmid reorganizations occurring in the natural context of colonization of human hosts were overwhelmingly driven by genetic rearrangements carried out by replicative transposons working in concert with the process of homologous recombination,” the authors report, adding that perhaps this kind of information will one day inform new approaches to combat antibiotic resistance.
“The rapidly decreasing cost of high-quality, long-read sequencing will enable the type of analysis described here to be applied more broadly to the problem of how resistance plasmids evolve in patients, hospitals, and the environment,” the scientists conclude.
Now, with the Sequel System and the recently released protocols for multiplexed microbial genome assembly (template preparation and data analysis), this application is even more accessible for the scientific community.
A recent Nature publication from a large team of scientists in Europe, Canada, and the US reports the use of SMRT Sequencing to elucidate the genome of Fragilariopsis cylindrus, a single-celled eukaryotic diatom adapted to living in polar waters of the Antarctic Ocean. The work has implications for the biotechnology industry, which looks to extremophiles as a potential source of important enzymes.
“Evolutionary genomics of the cold-adapted diatom Fragilariopsis cylindrus” comes from lead author Thomas Mock, senior author Igor Grigoriev, and many collaborators at the University of East Anglia, Earlham Institute, Joint Genome Institute, University of California, Berkeley, and several other organizations. The project investigated how this diatom evolved to thrive in its extreme environment, frequently living in high salinity directly under sea ice.
To achieve this, the team started by sequencing the F. cylindrus genome using both Sanger and PacBio systems. For SMRT Sequencing, the scientists produced two libraries with different insert sizes (4 kb and 20 kb) and ran seven SMRT cells, which yielded 63-fold coverage of the genome. The team used the diploid-aware FALCON assembler, which generated a 59.7 Mb assembly with 745 primary contigs. In an analysis and comparison to the Sanger assembly, the scientists determined the PacBio assembly was highly accurate in sequence (ranging from 99.65% to 100%) and structure (through validation fosmid comparison).
F. cylindrus is characterized by highly divergent alleles, which represent nearly a quarter of its genome. An analysis of those genes determined that the “divergent alleles were differentially expressed across environmental conditions, including darkness, low iron, freezing, elevated temperature and increased CO2,” the scientists report. “Alleles with the largest ratio of non-synonymous to synonymous nucleotide substitutions also show the most pronounced condition-dependent expression, suggesting a correlation between diversifying selection and allelic differentiation.” The team hypothesized that allele diversification took place after the last glacial period and has been maintained because the variety of gene content allows for rapid adaptation to a changing environment.
The Earlham Institute issued a press release about the project, including this comment from scientist Pirita Paajanen: “This is the first time at EI that a genome of this type was assembled into chromosomes. It is only very recently that the technology has been developed to cope with such a highly heterozygous organism and the data show that this diatom does actually have a large amount of variation within their genes.”
The second day of AGBT featured a number of great talks and posters, and also our user workshop called “Covering All the Bases with SMRT Sequencing.” We’d like to thank the hundreds of attendees who crowded into the room for this event!
The workshop kicked off with Nezih Cereb, CEO of Histogenetics, who spoke about using long-read PacBio sequencing for typing HLA class I and II genes, which are important for applications such as matching organ transplants to recipients. The company has been performing industrial-scale SMRT Sequencing since it first acquired its PacBio RS II instrument, but recently increased capacity further by adding the Sequel System. Histogenetics types thousands of HLA samples each day with these instruments, and Cereb noted that SMRT Sequencing is essential for its ability to phase mutations in the HLA alleles. This layer of information cannot be accessed with short-read or Sanger technologies but is critical for understanding an individual’s immune function. Cereb told attendees that the Sequel System has performed so well that his company acquired three more of these sequencers to boost HLA typing throughput and allow new investigations into other complex regions, such as KIR. He concluded by saying that sequencing the full HLA genes is now the gold standard for typing samples.
Next up was Margaret Roy from Calico Life Sciences presenting results of a de novo genome sequence for the naked mole rat. The rodent has a remarkably long life span and resistance to cancer, both of which make it an appealing model to the Calico team. There were two existing assemblies for it, but both had been done with short-read sequencing and were highly fragmented. Roy and her team used SMRT Sequencing to collect libraries with fragments of at least 25 kb and 45 kb and conducted sequencing on both the PacBio RS II and Sequel Systems. While the assembly is not yet complete, Roy told attendees that its metrics look good: the 2.5 Gb genome is represented in just 493 contigs, with the largest contig covering 71 Mb. The team is working to add scaffolding data from BioNano Genomics and will integrate additional data sets in the near future to achieve a high-quality final assembly for annotation. Roy said that the Sequel has been a welcome addition for the project, because lab members can load a tenth of the library onto a SMRT Cell and get five times the amount of data they would have with the PacBio RS II system. Once the project is complete, Roy said, she anticipates publishing the genome and releasing it publicly.
The final workshop speaker was our CSO, Jonas Korlach, who offered a look at where the Sequel System is currently and future improvements in the works. He showed a map of PacBio sequencer installations, noting that there are now about as many Sequel Systems in labs as PacBio RS II systems. He also reviewed some exciting applications of SMRT Sequencing, including shotgun metagenomics, human de novo assemblies, Iso-Seq analysis, and more. Looking ahead, Korlach said users can expect the Sequel System throughput to double this year and again next year, followed by a new SMRT Cell with eight times the number of zero-mode waveguides by the end of 2018. In total, this will enable a 30-fold increase of throughput, which should make it possible to complete a de novo human assembly for about $1,000. For only structural variation coverage, the cost could be as little as $200 per person.
In other conference talks during the day, Emma Teeling from the University College Dublin made a compelling case for her unique study of bats. These organisms have not been well represented by the genomics community, but she expressed hope that it would be possible in the not-too-distant future to achieve chromosome-level assemblies for bats using long-read sequencing and other advanced technologies. Separately, Mark DePristo from Google’s Verily Life Sciences unit presented results of a deep learning tool trained to spot variants from images of sequence reads. DeepVariant, which won an award for accurate SNP calling in the PrecisionFDA competition, has been used to call variants in PacBio data with excellent results; DePristo noted that it’s one of the few diploid variant callers available.
In one of the last talks of the day, Mike Schatz from Johns Hopkins University and Cold Spring Harbor Laboratory shared results of sequencing, assembling, and analyzing personalized, phased diploid genomes with Illumina, 10x, and PacBio data. The PacBio and 10x assemblies were most contiguous, but Schatz pointed out that the 10x assembly had many unknown bases, where the PacBio assembly was made up of complete contigs. Those platforms also led to more structural variant calls than the short-read data, but the 10x approach was not able to detect the range of variants that SMRT Sequencing could, missing long insertions and other events. Schatz reported a large and unexpected number of translocations identified with PacBio data, noting that follow-up studies confirmed they were real. He also said that SMRT Sequencing data has the best concordance, outperforming both Illumina and 10x results. His talk really got the audience excited about the power of using personalized diploid genomes to mine for structural variation and understand its effects on regulation.
It’s hard to believe there’s only one day left. We’re already wearing down but eager to see what else AGBT has in store for us!
We’re thrilled to be at the AGBT conference this week, taking place this year in Hollywood, Fla. On the first full day of the meeting, everyone’s mandatory wristbands look shiny and new (we suspect by the end of the week they’ll be as wilted as us). And we’ve even been getting that work/life balance down thanks to some beach volleyball with our friends from BioNano Genomics and Swift Biosciences.
At the opening session on Monday, Eimear Kenny from the Icahn School of Medicine at Mount Sinai showed why it’s essential to fully understand natural genetic diversity in a fascinating talk about a deep analysis of hospital patients from across New York City. She offered one example of a variant that’s incredibly rare in most populations, but is found in about 2% of people of Puerto Rican descent and is likely pathogenic. We were delighted to hear the presentation, which fits nicely with the efforts of many PacBio users to generate population-specific reference genomes to help characterize the full breadth of natural genetic variation.
Tuesday’s talks included an infectious disease theme, with several speakers supporting the idea that the global incidence of viral outbreaks is rising. These talks made it clear that the optimal response to these health threats involves having complete genome assemblies, including accessory genomes, where genes associated with antimicrobial resistance are often found.
The infectious disease theme continued with a talk from the Broad Institute’s Daniel Neafsey, who presented results of an ongoing effort to produce a high-quality genome assembly for Aedes aegypti, the mosquito responsible for transmitting Zika and other viruses. As part of the Aedes Genome Working Group, Neafsey’s team is working to replace a 10-year-old Sanger assembly of this mosquito with a PacBio-based assembly. The project is not yet complete, but already represents a big step forward despite the organism’s high heterozygosity and highly repetitive content: the new assembly used FALCON-Unzip to reduce the number of contigs by at least 10-fold and boost the contig N50 to nearly 2 Mb. Most importantly, gene content is far more complete in the new assembly. Neafsey offered the example of a sex determination gene, Nix, which was absent from the 2007 assembly but was found in the new assembly and could be essential for CRISPR-based efforts to control mosquito populations. Neafsey also showed the results of several scaffolding technologies — including Dovetail Genomics, 10x Genomics, and BioNano Genomics — and noted that the final result should include data from all three approaches. In the next few weeks, the team will integrate all this information, remove homologous contigs, and complete annotation. Neafsey noted that he hopes this work inspires other research communities to improve older draft assemblies they’ve been working with for other organisms. If you’re attending AGBT, you can see more assembly details at poster #1105.
Speaking of major genome improvements, Jason Underwood from the University of Washington and PacBio spoke about long-read genome assemblies of primates, as well as a new approach to understanding transcripts. The standard pipeline for building better primate assemblies in his lab involves PacBio sequencing, FALCON assembly, and scaffolding with BioNano Genomics mapping. Structural variants are then called with the SMRT-SV protocol. This has resulted in drastic improvements, such as a 560-fold more complete and contiguous gorilla assembly. Underwood also spoke about projects designed to understand human-specific variation that can be identified with these improved resources. Using the Iso-Seq method, the team is sequencing full-length cDNAs; in one recent study, they used the Sequel System to generate 118,000 full-length reads from a single SMRT Cell. They also developed Iso-Cross, through which they compare transcripts from two closely related organisms to each other (such as human and chimpanzee); the ones that map better to the organism they came from are more likely to have functional and specific roles. One example Underwood showed was a 1.9 kb human-specific deletion that removes an exon found in our close primate relatives. He told attendees that their investigation has turned up 200 human-specific variants that seem likely to have functional importance.
We invite all AGBT attendees to visit us in suite 317, where our fun Lego station lets everyone build their own plastic doppelganger!
We’re heading cross-country to the Advances in Genome Biology and Technology (AGBT) Meeting starting Monday in sunny Hollywood, Florida. There will be several great opportunities to learn about how scientists are using SMRT Sequencing and the Sequel System throughout the meeting, and we hope you have time to enjoy at least a few.
We’ll be hosting a one-hour workshop on Wednesday, February 15th, at 3:30 pm in the Grand Ballroom. Speakers will include Calico’s Margaret Roy, sharing her experience using the Sequel System for de novo sequencing of the naked mole-rat genome; Nezih Cereb of Histogenetics, discussing high-throughput HLA Class I whole gene and HLA Class II long-range typing using targeted sequencing and our own CSO Jonas Korlach, presenting the road map for Sequel performance improvements. The workshop won’t be live streamed, but we will be recording talks to share afterward and we will be blogging from the event. We’ll also be providing coffee and dessert in case you need an afternoon energy boost!
We’re looking forward to several program presentations that will include SMRT Sequencing results. On Tuesday, February 14th, don’t miss talks in the concurrent general biology session from Jason Underwood and Daniel Neafsey. The next night, Mark DePristo and Mike Schatz will speak in the informatics session. In addition, there will be several posters demonstrating SMRT Sequencing technology in various applications. Check out the complete list of our AGBT 2017 presentations and activities.
As usual, we’re proud to be sponsoring AGBT and helping to facilitate this event. Don’t forget to stop by our hospitality suite (#317) to build your own minifigure doppleganger and pose them with our Sequel System in the ‘lab’. We hope to see you there!
If you didn’t get to the Plant and Animal Genome meeting this year, you missed a great workshop featuring SMRT Sequencing users and the fascinating projects they’re working on across plant, animal, agricultural, and conservation sciences and human health. Here are quick summaries of each talk, with full video recordings available for more detail.
Our event kicked off with PacBio CSO Jonas Korlach welcoming attendees and delivering an update on the genomics community’s impressive advances with SMRT Sequencing. There are now more than 2,000 publications citing the PacBio long-read technology — a rate of about 30 per week. He also spoke about improvements to the platform, including better assembly tools such as FALCON and FALCON-unzip as well as the recently released Sequel System chemistry that delivers 5-8 Gb of data per SMRT Cell and significantly reduces DNA input requirements. These improvements make it possible to run a broader range of projects on the Sequel System.
Representing the plant side of the conference, the University of Arizona’s Rod Wing spoke about using SMRT Sequencing to produce high-quality genome assemblies for several varieties of rice. He’s undertaken this project to help develop higher-yielding, hardier strains of rice to feed the rapidly growing global population. In the work he presented, his team sequenced two parents of a common hybrid strain, generating the highest-quality publicly available assemblies of Indica rice ever produced. His data illustrated how long-read PacBio sequencing allows for excellent contiguity in assemblies, with one strain represented in just 19 contigs and a strain featuring eight complete chromosomes, including centromeres. Wing also included data from a third genome being sequenced with the Sequel System.
Other speakers focused on animals or insects. Rebecca Johnson from the Australian Museum Research Institute reported on the de novo genome assembly of a koala, a genome about 3.6 Gb in size. The work was undertaken due to conservation concerns for these marsupials, which have many biologically interesting features such as a gestation period of just 35 days. With SMRT Sequencing, her team produced what Johnson called the best marsupial assembly to date; analysis showed that only 5% of BUSCO genes were not represented. The assembly allowed her team to study lactation-related genes that are important for koala development, as well as immune elements (for example, koalas have been found to harbor novel antimicrobials that show effectiveness against drug-resistant bacteria). Johnson’s genome work continues, and she told attendees that she fully expects to achieve a chromosome-level assembly for the animal.
Richard Kuo from the Roslin Institute spoke about using the Iso-Seq method to study brain and embryo tissues from chicken. He said this approach addresses the limitations of other gene expression methods that skip long non-coding RNA (lncRNAs), the full universe of isoforms, and more. By producing full-length transcripts from the transcription start site to the transcription end site without any assembly required, SMRT Sequencing is ideal for characterizing the transcriptome. With the chicken project, Kuo evaluated the importance of protocols such as normalization and using 5’ cap selection, both of which provided richer data sets. Kuo told attendees that using the Iso-Seq method allows scientists to immediately leapfrog to the best available annotations, producing more information on the transcriptome than ever.
Rockefeller University’s Erich Jarvis presented an update on his work with bird genomes for the B10K project. He offered a comparison of assembly techniques for hummingbird, which has been analyzed with everything from short-read sequencers to genome mapping tools. The PacBio-powered assemblies consistently ranked as the highest quality, with the fewest contigs and best accuracy. He also included a look at four genes associated with vocal learning, which were complete in the PacBio assembly, showing the importance of incorporating long reads into the assembly.
Representing the insect front, Ben Matthews from Rockefeller University reported on a genome assembly project for Aedes aegypti, the common vector for Zika, dengue, and yellow fever. Noting that mosquitoes are believed to be the most deadly creature in the world, he said that a clear and complete understanding of their genomes will be essential to thwarting public health threats. The original assembly of the Aedes aegypti genome is nine years old and hasn’t been improved much, so Matthews and his colleagues turned to SMRT Sequencing for a new version of the same strain. The effort yielded a much better assembly, boosting the contig N50 from 83 kb to about 2 Mb. Further analysis showed that 7,500 transcripts map to the new assembly but not to the old, indicating a significant amount of new gene content. Matthews anticipates that this new assembly will replace the old one for the entire mosquito research community, and serve as an important resource for understanding resistance to repellants and designing guide RNAs for CRISPR genome modification to constrain population growth.
Many thanks to all of our workshop speakers for a great event!