With Single Molecule, Real-Time (SMRT) Sequencing and the Sequel Systems, you can affordably assemble reference-quality microbial genomes that are >99.999% (Q50) accurate.
Brett Hannigan, Computational Biology Project Leader at DNAnexus, demonstrates a fast, accurate, and cost-efficient solution for diploid-aware de novo genome assembly utilizing FALCON on the DNAnexus platform.
This tutorial provides an overview of the Hierarchical Genome Assembly Process (HGAP4) de novo assembly analysis application. HGAP4 generates accurate de novo assemblies using only PacBio data. HGAP4 is suitable for assembling a wide range of genome sizes and complexity. HGAP4 now includes some support for diploid-aware assembly. This tutorial covers features of SMRT Link v5.0.0.
The goal of this session is to help users complete their PacBio genome assembly and generate the best resource for their research. Kingan begins with a brief review of the diploid assembly process used by FALCON and FALCON-Unzip, highlighting the enhanced phasing of the Unzip module, and concluding with recommendations for genome polishing. Next, she explores how heterozygosity can influence the assembly process and how read coverage depth along the assembly can reveal important characteristics of assembly structure. Kingan then recommends approaches, including specific tools, that can be used to quality filter and curate the assembly, including annotation-, coverage-, and…
PacBio SMRT Sequencing has the unique ability to directly detect base modifications in addition to the nucleotide sequence of DNA. Because eukaryotes use base modifications to regulate gene expression, the absence or presence of epigenetic events relative to the location of genes is critical to elucidate the function of the modification. Therefore an integrated approach that combines multiple omic-scale assays is necessary to study complex organisms. Here, we present an integrated analysis of three sequencing experiments: 1) DNA sequencing, 2) base-modification detection, and 3) Iso-seq analysis, in Neurospora crassa, a filamentous fungus that has been used to make many landmark…
Single Molecule Real-Time (SMRT) DNA sequencing provides a wealth of kinetic information beyond the extraction of the primary DNA sequence, and this kinetic information can provide for the direct detection of modified bases present in genomic DNA. This method has been demonstrated for base modification detection in prokaryotes at base and strand resolutions. In eukaryotes, the common base modifications known to exist are the cytosine variants including methyl, hydroxymethyl, formyl and carboxyl forms. Each of these modifications exhibits different signatures in SMRT kinetic data, allowing for unprecedented possibilities to differentiate between them in direct sequencing data. We present early results…
PacBio sequencing holds promise for addressing large-genome complexities, such as long, highly repetitive, low-complexity regions and duplication events that are difficult to resolve with short-read technologies. Several strategies, with varying outcomes, are available for de novo sequencing and assembling of larger genomes. Using a diploid fungal genome, estimated to be ~80 Mb in size, as the basis dataset for comparison, we highlight assembly options when using only PacBio sequencing or a combined strategy leveraging data sets from multiple sequencing technologies. Data generated from SMRT Sequencing was subjected to assembly using different large-genome assemblers, and comparisons of the results will be…
SFAF 2014 Presentation Slides: James Gurtowski of Cold Spring Harbor Laboratory (CSHL) shared assembly results for a variety of eukaryotic genomes, including yeast, arabidopsis, and rice.
PacBio bioinformatician, Elizabeth Tseng, reviews the bioinformatics strategies utilizing PacBio long-read sequencing data for isoform sequencing for full-length transcript sequencing without assembly.
Outside of the simplest cases (haploid, bacteria, or inbreds), genomic information is not carried in a single reference per individual, but rather has higher ploidy (n=>2) for almost all organisms. The existence of two or more highly related sequences within an individual makes it extremely difficult to build high quality, highly contiguous genome assemblies from short DNA fragments. Based on the earlier work on a polyploidy aware assembler, FALCON ( https://github.com/PacificBiosciences/FALCON) , we developed new algorithms and software (“FALCON-unzip”) for de novo haplotype reconstructions from SMRT Sequencing data. We generate two datasets for developing the algorithms and the prototype software:…
While genome assembly projects have been successful in many haploid and inbred species, the assembly of non-inbred or rearranged heterozygous genomes remains a major challenge. To address this challenge, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short-…
De novo assembly is a large part of JGI’s analysis portfolio. Repetitive DNA sequences are abundant in a wide range of organisms we sequence and pose a significant technical challenge for assembly. We are interested in long read technologies capable of spanning genomic repeats to produce better assemblies. We currently have three RS II and two Sequel PacBio machines. RS II machines are primarily used for fungal and microbial genome assembly as well as synthetic biology validation. Between microbes and fungi we produce hundreds of PacBio libraries a year and for throughput reasons the vast majority of these are >10…
The Ganoderma genus represents clear biotechnological potential, due to the large quantity of molecules with biological activity that could be explored. However, available information regarding the biotechnological importance of species within Ganoderma, other than G. lucidum, is quite limited. Genomic studies of little-known species can contribute to the knowledge thereof, as well as the search for metabolic pathways and the identification of genes which code for proteins that may be of biotechnological relevance. Therefore, the objective of the present study was to obtain the G. australe genome, through the use of new sequencing technologies. Genomic DNA from G. australe was…
Arthrinium phaeospermum (Corda) M.B. Ellis is a globally distributed pathogenic fungus with a wide host range; its hosts include not only plants, but also humans and animals. This study aimed to develop genomic resources for A. phaeospermum to provide solid data and a theoretical basis for further studies of its pathogenesis, transcriptomics, proteomics, metabolomics and RNA genomics. The genome was obtained from the mycelia of the strain AP-Z13 using a combination of analyses with the high-throughput Illumina HiSeq 4000 system and PacBio RSII LongRead sequencing platform. Functional annotation was performed by BLASTing protein sequences against those in different publicly available…