In this PAG 2018 presentation, Marty Badgett of PacBio, shares updates on PacBio products and performance. He highlights high-quality genome assembles for Arabidopsis, rice, and maize, the SMRTbell Express Template Prep Kit, SMRT Analysis updates, and the Iso-Seq method for RNA sequencing.
At DuPont Pioneer, DNA sequencing is paramount for R&D to reveal the genetic basis for traits of interest in commercial crops such as maize, soybean, sorghum, sunflower, alfalfa, canola, wheat, rice, and others. They cannot afford to wait the years it has historically taken for high-quality reference genomes to be produced. Nor can they rely on a single reference to represent the genetic diversity in its germplasm.
Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive…
The decrease in sequencing cost and increased sophistication of assembly algorithms for short-read platforms has resulted in a sharp increase in the number of species with genome assemblies. However, these assemblies are highly fragmented, with many gaps, ambiguities, and errors, impeding downstream applications. We demonstrate current state of the art for de novo assembly using the domestic goat (Capra hircus) based on long reads for contig formation, short reads for consensus validation, and scaffolding by optical and chromatin interaction mapping. These combined technologies produced what is, to our knowledge, the most continuous de novo mammalian assembly to date, with chromosome-length…
Chenopodium quinoa (quinoa) is a highly nutritious grain identified as an important crop to improve world food security. Unfortunately, few resources are available to facilitate its genetic improvement. Here we report the assembly of a high-quality, chromosome-scale reference genome sequence for quinoa, which was produced using single-molecule real-time sequencing in combination with optical, chromosome-contact and genetic maps. We also report the sequencing of two diploids from the ancestral gene pools of quinoa, which enables the identification of sub-genomes in quinoa, and reduced-coverage genome sequences for 22 other samples of the allotetraploid goosefoot complex. The genome sequence facilitated the identification of…
At AGBT 2017, Margaret Roy from Calico Life Sciences discussed a de novo genome sequencing effort for the naked mole rat. This animal has a remarkably long life span and resistance to cancer, both of which make it interesting for studies of life extension. The team is using SMRT Sequencing for a more complete, contiguous assembly than the two existing short-read-based assemblies. Included: data from the Sequel System.
To test the impact of high-quality genome assemblies on biological research, we applied PacBio long-read sequencing in conjunction with the new, diploid-aware FALCON-Unzip assembler to a number of bird species. These included: the zebra finch, for which a consortium-generated, Sanger-based reference exists, to determine how the FALCON-Unzip assembly would compare to the current best references available; Anna’s hummingbird genome, which had been assembled with short-read sequencing methods as part of the Avian Phylogenomics phase I initiative; and two critically endangered bird species (kakapo and ‘alala) of high importance for conservations efforts, whose genomes had not previously been sequenced and assembled.
At AGBT 2017, Mike Schatz from Johns Hopkins University and Cold Spring Harbor Laboratory presented data from sequencing, assembling, and analyzing personalized, phased diploid genomes with either Illumina, 10x Genomics, and PacBio SMRT Sequencing. Compared to the short-read-based methods, PacBio data assembled in large, complete contigs and contained the broadest range of structural variants with the best resolution. Plus: unexpected translocation findings with SMRT Sequencing, validated in follow-up studies.
At PAG 2017, Rockefeller University’s Erich Jarvis offered an in-depth comparison of methods for generating highly contiguous genome assemblies, using hummingbird as the basis to evaluate a number of sequencing and scaffolding technologies. Analyses include gene content, error rate, chromosome metrics, and more. Plus: a long-read look at four genes associated with vocal learning.
At PAG 2017, Rod Wing presented five new, high-quality rice genome assemblies developed with SMRT Sequencing, including one that has eight complete chromosomes including centromeres. He also offered an early look at data generated with the Sequel System for a new assembly underway. This work is done with the goal of developing rice varieties that will be better suited to feeding a rapidly growing global population.
In this PAG 2017 presentation, Ben Matthews describes a new genome assembly for Aedes aegypti, the mosquito responsible for spreading Zika virus, yellow fever, and other infectious diseases. By using PacBio long-read sequencing, scientists produced an assembly that is much more complete and contiguous than a previous assembly; 7,500 transcripts map to the new contigs but not to the old assembly. The genome is important for designing guide RNAs for CRISPR, understanding resistance to mosquito repellants, and much more.
Many applications of high throughput sequencing rely on the availability of an accurate reference genome. Errors in the reference genome assembly increase the number of false-positives in downstream analyses. Recently, we have shown that over 33% of the current pig reference genome, Sscrofa10.2, is either misassembled or otherwise unreliable for genomic analyses. Additionally, ~10% of the bases in the assembly are Ns in gaps of an arbitrary size. Thousands of highly fragmented contigs remain unplaced and many genes are known to be missing from the assembly. Here we present a new assembly of the pig genome, Sscrofa11, assembled using 65X…
Aedes aegypti is a tropical and subtropical mosquito vector for Zika, yellow fever, dengue fever, chikungunya, and other diseases. The outbreak of Zika in the Americas, which can cause microcephaly in the fetus of infected women, adds urgency to the need for a high-quality reference genome in order to better understand the organism’s biology and its role in transmitting human disease. We describe the first diploid assembly of an insect genome, using SMRT sequencing and the open-source assembler FALCON-Unzip. This assembly has high contiguity (contig N50 1.3 Mb), is more complete than previous assemblies (Length 1.45 Gb with 87% BUSCO…
Scientists in Brazil paired PacBio long-read sequencing with Dovetail Genomics chromatin proximity ligation to generate a highly contiguous genome assembly for the cashew tree. With this resource, they are on their way to improving breeding programs to protect the plant from disease and boost yield.
At the University of California, Davis, Dario Cantu is applying long-read PacBio sequencing to the heterozygous genome of the Cabernet Sauvignon grape. Now, his team has access to whole genome data that could help guard against the effects of climate change and disease.