Outside of the simplest cases (haploid, bacteria, or inbreds), genomic information is not carried in a single reference per individual, but rather has higher ploidy (n=>2) for almost all organisms. The existence of two or more highly related sequences within an individual makes it extremely difficult to build high quality, highly contiguous genome assemblies from short DNA fragments. Based on the earlier work on a polyploidy aware assembler, FALCON ( https://github.com/PacificBiosciences/FALCON) , we developed new algorithms and software (“FALCON-unzip”) for de novo haplotype reconstructions from SMRT Sequencing data. We generate two datasets for developing the algorithms and the prototype software:…
While genome assembly projects have been successful in many haploid and inbred species, the assembly of non-inbred or rearranged heterozygous genomes remains a major challenge. To address this challenge, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short-…
In this presentation at PAG 2020, Bart Nijland of Genetwister Technologies explains how his team set out to make a haplotype-aware assembly of the highly complex tetraploid Rosa x hybrida L. genome in order to capture its full range of genetic variation. HiFi reads generated from PacBio’s Sequel II System have made it possible to parse out critical information from many of the plant’s parental genes.
The release of the PacBio Sequel II System in 2019 brought dramatic throughput improvements and protocols for producing a new data type, highly accurate long reads or HiFi reads. PacBio is the only sequencing technology to offer highly accurate long reads (HiFi reads) that provide Sanger-quality accuracy (>99%) with the read lengths needed for assembly of complex genomes. The long length and high accuracy of HiFi reads makes them the ideal starting point for many applications, and one area of major interest is genome assembly. HiFi assembly is faster, cheaper, more accurate, and easier to phase than standard long-read assembly.…
The utility of new highly accurate long reads, or HiFi reads, was first demonstrated for calling all variant types in human genomes. It has since been shown that HiFi reads can be used to generate contiguous, complete, and accurate human genomes, even in repeat structures such as centromeres and telomeres. In this virtual workshop scientists from PacBio as well as Tina Graves-Lindsay from the McDonnell Genome Institute at Washington University share the many improvements we’ve made to HiFi sequencing in the past year, tools that take advantage of HiFi data for variant detection and assembly, and examples in numerous genomics…
Like many other crops, the cultivated peanut (Arachis hypogaea L.) is of hybrid origin and has a polyploid genome that contains essentially complete sets of chromosomes from two ancestral species. Here we report the genome sequence of peanut and show that after its polyploid origin, the genome has evolved through mobile-element activity, deletions and by the flow of genetic information between corresponding ancestral chromosomes (that is, homeologous recombination). Uniformity of patterns of homeologous recombination at the ends of chromosomes favors a single origin for cultivated peanut and its wild counterpart A. monticola. However, through much of the genome, homeologous recombination…
High oil and protein content make tetraploid peanut a leading oil and food legume. Here we report a high-quality peanut genome sequence, comprising 2.54?Gb with 20 pseudomolecules and 83,709 protein-coding gene models. We characterize gene functional groups implicated in seed size evolution, seed oil content, disease resistance and symbiotic nitrogen fixation. The peanut B subgenome has more genes and general expression dominance, temporally associated with long-terminal-repeat expansion in the A subgenome that also raises questions about the A-genome progenitor. The polyploid genome provided insights into the evolution of Arachis hypogaea and other legume chromosomes. Resequencing of 52 accessions suggests that…
Construction of chromosome-level assembly is a vital step in achieving the goal of a ‘Platinum’ genome, but it remains a major challenge to assemble and anchor sequences to chromosomes in autopolyploid or highly heterozygous genomes. High-throughput chromosome conformation capture (Hi-C) technology serves as a robust tool to dramatically advance chromosome scaffolding; however, existing approaches are mostly designed for diploid genomes and often with the aim of reconstructing a haploid representation, thereby having limited power to reconstruct chromosomes for autopolyploid genomes. We developed a novel algorithm (ALLHiC) that is capable of building allele-aware, chromosomal-scale assembly for autopolyploid genomes using Hi-C paired-end…
Improving traits in wheat has historically been challenging due to its large and polyploid genome, limited genetic diversity and in-field phenotyping constraints. However, within recent years many of these barriers have been lowered. The availability of a chromosome-level assembly of the wheat genome now facilitates a step-change in wheat genetics and provides a common platform for resources, including variation data, gene expression data and genetic markers. The development of sequenced mutant populations and gene-editing techniques now enables the rapid assessment of gene function in wheat directly. The ability to alter gene function in a targeted manner will unmask the effects…