Single Molecule, Real-Time (SMRT) Sequencing holds promise for addressing new frontiers in large genome complexities, such as long, highly repetitive, low-complexity regions and duplication events, and differentiating between transcript isoforms that are difficult to resolve with short-read technologies. We present solutions available for both reference genome improvement (>100 MB) and transcriptome research to best leverage long reads that have exceeded 20 Kb in length. Benefits for these applications are further realized with consistent use of size-selection of input sample using the BluePippin™ device from Sage Science. Highlights from our genome assembly projects using the latest P5-C3 chemistry on model organisms will be shared. Assembly contig N50 have exceeded 6 Mb and we observed longest contig exceeding 12.5 Mb with an average base quality of QV50. Additionally, the value of long, intact reads to provide a no-assembly approach to investigate transcript isoforms using our Iso-Seq Application will be presented.
Arabica coffee, revered for its taste and aroma, has a complex genome. It is an allotetraploid (2n=4x=44) with a genome size of approximately 1.3 Gb, derived from the recent (< 0.6 Mya) hybridization of two diploid progenitors (2n=2x=22), C. canephora (710 Mb) and C. eugenioides (670 Mb). Both parental species diverged recently (< 4.2Mya) and their genomes are highly homologous. To facilitate assembly, a dihaploid plant was chosen for sequencing. Initial genome assembly attempts with short read data produced an assembly covering 1,031 Mb of the C. arabica genome with a contig L50 of 9kb. By implementation of long read PacBio at greater than 50x coverage and cutting-edge PacBio software, a de novo PacBio-only genome assembly was constructed that covers 1,042 Mb of the genome with an L50 of 267 kb. The two assemblies were assessed and compared to determine gene content, chimeric regions, and the ability to separate the parental genomes. A genetic map that contains 600 SSRs is being used for anchoring the contigs and improve the sub-genome differentiation together with the search of sub-genome specific SNPs. PacBio transcriptome sequencing is currently being added to finalize gene annotation of the polished assembly. The finished genome assembly will be used to guide re-sequencing assemblies of parental genomes (C. canephora and C. eugenioides) as well as a template for GBS analysis and whole genome re-sequencing of a set of C. arabica accessions representative of the species diversity. The obtained data will provide powerful genomic tools to enable more efficient coffee breeding strategies for this crop, which is highly susceptible to climate change and is the main source of income for millions of small farmers in producing countries.
Building a platinum human genome assembly from single haplotype human genomes generated from long molecule sequencing
The human reference sequence has provided a foundation for studies of genome structure, human variation, evolutionary biology, and disease. At the time the reference was originally completed there were some loci recalcitrant to closure; however, the degree to which structural variation and diversity affected our ability to produce a representative genome sequence at these loci was still unknown. Many of these regions in the genome are associated with large, repetitive sequences and exhibit complex allelic diversity such producing a single, haploid representation is not possible. To overcome this challenge, we have sequenced DNA from two hydatidiform moles (CHM1 and CHM13), which are essentially haploid. CHM13 was sequenced with the latest PacBio technology (P6-C5) to 52X genome coverage and assembled using Daligner and Falcon v0.2 (GCA_000983455.1, CHM13_1.1). Compared to the first mole (CHM1) PacBio assembly (GCA_001007805.1, 54X) contig N50 of 4.5Mb, the contig N50 of CHM13_1.1 is almost 13Mb, and there is a 13-fold reduction in the number of contigs. This demonstrates the improved contiguity of sequence generated with the new chemistry. We annotated 50,188 RefSeq transcripts of which only 0.63% were split transcripts, and the repetitive and segmental duplication content was within the expected range. These data all indicate an extremely high quality assembly. Additionally, we sequenced CHM13 DNA using Illumina SBS technology to 60X coverage, aligned these reads to the GRCh37, GRCh38, and CHM13_1.1 assemblies and performed variant calling using the SpeedSeq pipeline. The number of single nucleotide variants (SNV) and indels was comparable between GRCh37 and GRCh38. Regions that showed increased SNV density in GRCh38 compared to GRCh37 could be attributed to the addition of centromeric alpha satellite sequence to the reference assembly. Alternatively, regions of decreased SNV density in GRCh38 were concentrated in regions that were improved from BAC based sequencing of CHM1 such as 1p12 and 1q21 containing the SRGAP2 gene family. The alignment of PacBio reads to GRCh37 and GRCh38 assemblies allowed us to resolve complex loci such as the MHC region where the best alignment was to the DBB (A2-B57-DR7) haplotype. Finally, we will discuss how combining the two high quality mole assemblies can be used for benchmarking and novel bioinformatics tool development.
Reference genome assemblies provide important context in genetics by standardizing the order of genes and providing a universal set of coordinates for individual nucleotides. Often due to the high complexity of genic regions and higher copy number of genes involved in immune function, immunity-related genes are often misassembled in current reference assemblies. This problem is particularly ubiquitous in the reference genomes of non-model organisms as they often do not receive the years of curation necessary to resolve annotation and assembly errors. In this study, we reassemble a reference genome of the goat (Capra hircus) using modern PacBio technology in tandem with BioNano Genomics Irys optical maps and Lachesis clustering in order to provide a high quality reference assembly without the need for extensive filtering. Initial PacBio assemblies using P5C4 chemistry achieved contig N50’s of 4 Megabases and a BUSCO completion score of 84.0%, which is comparable to several finished model organism reference assemblies. We used BioNano Genomics’ Irys platform to generate 336 scaffolds from this data with a scaffold N50 of 24 megabases and total genome coverage of 98%. Lachesis interaction maps were used with a clustering algorithm to associate Irys scaffolds into the expected 30 chromosome physical maps. Comparisons of the initial hybrid scaffolds generated from the long read contigs and optical map information to a previously generated RH map revealed that the entirety of the Goat autosome 20 physical map was contained within one scaffold. Additionally, the BioNano scaffolding resolved several difficult regions that contained genes related to innate immunity which were problem regions in previous reference genome assemblies.
In this presentation, Justin Blethrow provides an overview of recent and upcoming developments across PacBio’s SMRT Sequencing product portfolio, and their implications for PacBio’s major applications. In presenting the product…
In this PacBio User Group Meeting presentation, Nic Wheeler of University of Wisconsin-Madison, speaks about RNA sequencing for filarial nematodes associated with understudied tropical diseases. His team used Iso-Seq analysis…
Domestication of clonally propagated crops such as pineapple from South America was hypothesized to be a ‘one-step operation’. We sequenced the genome of Ananas comosus var. bracteatus CB5 and assembled 513?Mb into 25 chromosomes with 29,412 genes. Comparison of the genomes of CB5, F153 and MD2 elucidated the genomic basis of fiber production, color formation, sugar accumulation and fruit maturation. We also resequenced 89 Ananas genomes. Cultivars ‘Smooth Cayenne’ and ‘Queen’ exhibited ancient and recent admixture, while ‘Singapore Spanish’ supported a one-step operation of domestication. We identified 25 selective sweeps, including a strong sweep containing a pair of tandemly duplicated bromelain inhibitors. Four candidate genes for self-incompatibility were linked in F153, but were not functional in self-compatible CB5. Our findings support the coexistence of sexual recombination and a one-step operation in the domestication of clonally propagated crops. This work guides the exploration of sexual and asexual domestication trajectories in other clonally propagated crops.
Updated assembly resource of Phytophthora ramorum Pr102 isolate incorporating long reads from PacBio sequencing.
The NA1 clonal lineage of Phytophthora ramorum is responsible for Sudden Oak Death, an epidemic that has devastated California’s coastal forest ecosystems. An NA1 isolate Pr102 derived from coast live oak in California was previously sequenced and reported with 65 Mb assembly containing 12 Mb gaps in 2576 scaffolds. Here we report an improved 70 Mb genome in 1512 scaffolds with 6752 bp gaps after incorporating PacBio P5-C3 longreads. This assembly contains 19494 gene models (average gene length 2515 bp) compared to 16134 genes (average gene length of 1673 bp) in the previous version. We predicted 29 new RXLRs and 76 new paralogs of a total 392 RXLRs from this assembly. We predicted 35 CRNs compared to 19 in earlier version with six paralogs. Our lncRNAs prediction identified 255 candidates. This new resource will be invaluable for future evolution studies on the invasive plant pathogen.
Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.
Ilyonectria mors-panacis is a serious disease hampering the production of Panax notoginseng, an important Chinese medicinal herb, widely used for its anti-inflammatory, anti-fatigue, hepato-protective, and coronary heart disease prevention effects. Here, we report the first Illumina-Pacbio hybrid sequenced draft genome assembly of I. mors-panacis strain G3B and its annotation. The availability of this genome sequence not only represents an important tool toward understanding the genetics behind the infection mechanism of I. mors-panacis strain G3B but also will help illuminate the complexities of the taxonomy of this species.
Full-length mRNA sequencing and gene expression profiling reveal broad involvement of natural antisense transcript gene pairs in pepper development and response to stresses.
Pepper is an important vegetable with great economic value and unique biological features. In the past few years, significant development has been made towards understanding the huge complex pepper genome; however, pepper functional genomics has not been well studied. To better understand the pepper gene structure and pepper gene regulation, we conducted full-length mRNA sequencing by PacBio sequencing and obtained 57862 high-quality full-length mRNA sequences derived from 18362 previously annotated and 5769 newly detected genes. New gene models were built that combined the full-length mRNA sequences and corrected approximately 500 fragmented gene models from previous annotations. Based on the full-length mRNA, we identified 4114 and 5880 pepper genes forming natural antisense transcript (NAT) genes in-cis and in-trans, respectively. Most of these genes accumulate small RNAs in their overlapping regions. By analyzing these NAT gene expression patterns in our transcriptome data, we identified many NAT pairs responsive to a variety of biological processes in pepper. Pepper formate dehydrogenase 1 (FDH1), which is required for R-gene-mediated disease resistance, may be regulated by nat-siRNAs and participate in a positive feedback loop in salicylic acid biosynthesis during resistance responses. Several cis-NAT pairs and subgroups of trans-NAT genes were responsive to pepper pericarp and placenta development, which may play roles in capsanthin and capsaicin biosynthesis. Using a comparative genomics approach, the evolutionary mechanisms of cis-NATs were investigated, and we found that an increase in intergenic sequences accounted for the loss of most cis-NATs, while transposon insertion contributed to the formation of most new cis-NATs. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.
De novo genome assembly of the endangered Acer yangbiense, a plant species with extremely small populations endemic to Yunnan Province, China.
Acer yangbiense is a newly described critically endangered endemic maple tree confined to Yangbi County in Yunnan Province in Southwest China. It was included in a programme for rescuing the most threatened species in China, focusing on “plant species with extremely small populations (PSESP)”.We generated 64, 94, and 110 Gb of raw DNA sequences and obtained a chromosome-level genome assembly of A. yangbiense through a combination of Pacific Biosciences Single-molecule Real-time, Illumina HiSeq X, and Hi-C mapping, respectively. The final genome assembly is ~666 Mb, with 13 chromosomes covering ~97% of the genome and scaffold N50 sizes of 45 Mb. Further, BUSCO analysis recovered 95.5% complete BUSCO genes. The total number of repetitive elements account for 68.0% of the A. yangbiense genome. Genome annotation generated 28,320 protein-coding genes, assisted by a combination of prediction and transcriptome sequencing. In addition, a nearly 1:1 orthology ratio of dot plots of longer syntenic blocks revealed a similar evolutionary history between A. yangbiense and grape, indicating that the genome has not undergone a whole-genome duplication event after the core eudicot common hexaploidization.Here, we report a high-quality de novo genome assembly of A. yangbiense, the first genome for the genus Acer and the family Aceraceae. This will provide fundamental conservation genomics resources, as well as representing a new high-quality reference genome for the economically important Acer lineage and the wider order of Sapindales. © The Author(s) 2019. Published by Oxford University Press.
Yellowhorn (Xanthoceras sorbifolium) is a species of the Sapindaceae family native to China and is an oil tree that can withstand cold and drought conditions. A pseudomolecule-level genome assembly for this species will not only contribute to understanding the evolution of its genes and chromosomes but also bring yellowhorn breeding into the genomic era.Here, we generated 15 pseudomolecules of yellowhorn chromosomes, on which 97.04% of scaffolds were anchored, using the combined Illumina HiSeq, Pacific Biosciences Sequel, and Hi-C technologies. The length of the final yellowhorn genome assembly was 504.2 Mb with a contig N50 size of 1.04 Mb and a scaffold N50 size of 32.17 Mb. Genome annotation revealed that 68.67% of the yellowhorn genome was composed of repetitive elements. Gene modelling predicted 24,672 protein-coding genes. By comparing orthologous genes, the divergence time of yellowhorn and its close sister species longan (Dimocarpus longan) was estimated at ~33.07 million years ago. Gene cluster and chromosome synteny analysis demonstrated that the yellowhorn genome shared a conserved genome structure with its ancestor in some chromosomes.This genome assembly represents a high-quality reference genome for yellowhorn. Integrated genome annotations provide a valuable dataset for genetic and molecular research in this species. We did not detect whole-genome duplication in the genome. The yellowhorn genome carries syntenic blocks from ancient chromosomes. These data sources will enable this genome to serve as an initial platform for breeding better yellowhorn cultivars. © The Author(s) 2019. Published by Oxford University Press.
Pecan (Carya illinoinensis) and Chinese hickory (C. cathayensis) are important commercially cultivated nut trees in the genus Carya (Juglandaceae), with high nutritional value and substantial health benefits.We obtained >187.22 and 178.87 gigabases of sequence, and ~288× and 248× genome coverage, to a pecan cultivar (“Pawnee”) and a domesticated Chinese hickory landrace (ZAFU-1), respectively. The total assembly size is 651.31 megabases (Mb) for pecan and 706.43 Mb for Chinese hickory. Two genome duplication events before the divergence from walnut were found in these species. Gene family analysis highlighted key genes in biotic and abiotic tolerance, oil, polyphenols, essential amino acids, and B vitamins. Further analyses of reduced-coverage genome sequences of 16 Carya and 2 Juglans species provide additional phylogenetic perspective on crop wild relatives.Cooperative characterization of these valuable resources provides a window to their evolutionary development and a valuable foundation for future crop improvement. © The Author(s) 2019. Published by Oxford University Press.
More than 3,000 species of octocorals (Cnidaria, Anthozoa) inhabit an expansive range of environments, from shallow tropical seas to the deep-ocean floor. They are important foundation species that create coral “forests,” which provide unique niches and 3-dimensional living space for other organisms. The octocoral genus Renilla inhabits sandy, continental shelves in the subtropical and tropical Atlantic and eastern Pacific Oceans. Renilla is especially interesting because it produces secondary metabolites for defense, exhibits bioluminescence, and produces a luciferase that is widely used in dual-reporter assays in molecular biology. Although several anthozoan genomes are currently available, the majority of these are hexacorals. Here, we present a de novo assembly of an azooxanthellate shallow-water octocoral, Renilla muelleri.We generated a hybrid de novo assembly using MaSuRCA v.3.2.6. The final assembly included 4,825 scaffolds and a haploid genome size of 172 megabases (Mb). A BUSCO assessment found 88% of metazoan orthologs present in the genome. An Augustus ab initio gene prediction found 23,660 genes, of which 66% (15,635) had detectable similarity to annotated genes from the starlet sea anemone, Nematostella vectensis, or to the Uniprot database. Although the R. muelleri genome may be smaller (172 Mb minimum size) than other publicly available coral genomes (256-448 Mb), the R. muelleri genome is similar to other coral genomes in terms of the number of complete metazoan BUSCOs and predicted gene models.The R. muelleri hybrid genome provides a novel resource for researchers to investigate the evolution of genes and gene families within Octocorallia and more widely across Anthozoa. It will be a key resource for future comparative genomics with other corals and for understanding the genomic basis of coral diversity. © The Author(s) 2019. Published by Oxford University Press.