June 1, 2021  |  

SMRT Sequencing solutions for investigative studies to understand evolutionary processes.

Single Molecule, Real-Time (SMRT) Sequencing holds promise for addressing new frontiers to understand molecular mechanisms in evolution and gain insight into adaptive strategies. With read lengths exceeding 10 kb, we are able to sequence high-quality, closed microbial genomes with associated plasmids, and investigate large genome complexities, such as long, highly repetitive, low-complexity regions and multiple tandem-duplication events. Improved genome quality, observed at 99.9999% (QV60) consensus accuracy, and significant reduction of gap regions in reference genomes (up to and beyond 50%) allow researchers to better understand coding sequences with high confidence, investigate potential regulatory mechanisms in noncoding regions, and make inferences about evolutionary strategies that are otherwise missed by the coverage biases associated with short- read sequencing technologies. Additional benefits afforded by SMRT Sequencing include the simultaneous capability to detect epigenomic modifications and obtain full-length cDNA transcripts that obsolete the need for assembly. With direct sequencing of DNA in real-time, this has resulted in the identification of numerous base modifications and motifs, which genome-wide profiles have linked to specific methyltransferase activities. Our new offering, the Iso-Seq Application, allows for the accurate differentiation between transcript isoforms that are difficult to resolve with short-read technologies. PacBio reads easily span transcripts such that both 5’/3’ primers for cDNA library generation and the poly-A tail are observed. As such, exon configuration and intron retention events can be analyzed without ambiguity. This technological advance is useful for characterizing transcript diversity and improving gene structure annotations in reference genomes. We review solutions available with SMRT Sequencing, from targeted sequencing efforts to obtaining reference genomes (>100 Mb). This includes strategies for identifying microsatellites and conducting phylogenetic comparisons with targeted gene families. We highlight how to best leverage our long reads that have exceeded 20 kb in length for research investigations, as well as currently available bioinformatics strategies for analysis. Benefits for these applications are further realized with consistent use of size selection of input sample using the BluePippin™ device from Sage Science as demonstrated in our genome improvement projects. Using the latest P5-C3 chemistry on model organisms, these efforts have yielded an observed contig N50 of ~6 Mb, with the longest contig exceeding 12.5 Mb and an average base quality of QV50.


June 1, 2021  |  

SMRT Sequencing solutions for plant genomes and transcriptomes

Single Molecule, Real-Time (SMRT) Sequencing provides efficient, streamlined solutions to address new frontiers in plant genomes and transcriptomes. Inherent challenges presented by highly repetitive, low-complexity regions and duplication events are directly addressed with multi- kilobase read lengths exceeding 8.5 kb on average, with many exceeding 20 kb. Differentiating between transcript isoforms that are difficult to resolve with short-read technologies is also now possible. We present solutions available for both reference genome and transcriptome research that best leverage long reads in several plant projects including algae, Arabidopsis, rice, and spinach using only the PacBio platform. Benefits for these applications are further realized with consistent use of size-selection of input sample using the BluePippin™ device from Sage Science. We will share highlights from our genome projects using the latest P5- C3 chemistry to generate high-quality reference genomes with the highest contiguity, contig N50 exceeding 1 Mb, and average base quality of QV50. Additionally, the value of long, intact reads to provide a no-assembly approach to investigate transcript isoforms using our Iso-Seq protocol will be presented for full transcriptome characterization and targeted surveys of genes with complex structures. PacBio provides the most comprehensive assembly with annotation when combining offerings for both genome and transcriptome research efforts. For more focused investigation, PacBio also offers researchers opportunities to easily investigate and survey genes with complex structures.


June 1, 2021  |  

Structural variant detection with low-coverage Pacbio sequencing

Despite amazing progress over the past quarter century in the technology to detect genetic variants, intermediate-sized structural variants (50 bp to 50 kb) have remained difficult to identify. Such variants are too small to detect with array comparative genomic hybridization, but too large to reliably discover with short-read DNA sequencing. Recent de novo assemblies of human genomes have demonstrated the power of PacBio Single Molecule, Real-Time (SMRT) Sequencing to fill this technology gap and sensitively identify structural variants in the human genome. While de novo assembly is the ideal method to identify variants in a genome, it requires high depth of coverage. A structural variant discovery approach that utilizes lower coverage would facilitate evaluation of large patient and population cohorts. Here we introduce such an approach and apply it to 10-fold coverage of several human genomes generated on the PacBio Sequel System. To identify structural variants in low-fold coverage whole genome sequencing data, we apply a reference-based, re-sequencing workflow. First, reads are mapped to the human reference genome with a local aligner. The local alignments often end at structural variant loci. To connect co-linear local alignments across structural variants, we apply a novel algorithm that merges alignments into “chains” and refines the alignment edges. Then, the chained alignments are scanned for windows with an excess of insertions or deletions to identify candidate structural variant loci. Finally, the read support at each putative variant locus is evaluated to produce a variant call. Single nucleotide information is incorporated to phase and evaluate the zygosity of each structural variant. In 10-fold coverage human genome sequence, we identify the vast majority of the structural variants found by de novo assembly, thus demonstrating the power of low-fold coverage SMRT Sequencing to affordably and effectively detect structural variants.


June 1, 2021  |  

Detecting pathogenic structural variants with low-coverage PacBio sequencing.

Though a role for structural variants in human disease has long been recognized, it has remained difficult to identify intermediate-sized variants (50 bp to 5 kb), which are too small to detect with array comparative genomic hybridization, but too large to reliably discover with short-read DNA sequencing. Recent studies have demonstrated that PacBio Single Molecule, Real-Time (SMRT) sequencing fills this technology gap. SMRT sequencing detects tens of thousands of structural variants in the human genome, approximately five times the sensitivity of short-read DNA sequencing.


June 1, 2021  |  

Best practices for diploid assembly of complex genomes using PacBio: A case study of Cascade Hops

A high quality reference genome is an essential resource for plant and animal breeding and functional and evolutionary studies. The common hop (Humulus lupulus, Cannabaceae) is an economically important crop plant used to flavor and preserve beer. Its genome is large (flow cytometrybased estimates of diploid length >5.4Gb1), highly repetitive, and individual plants display high levels of heterozygosity, which make assembly of an accurate and contiguous reference genome challenging with conventional short-read methods. We present a contig assembly of Cascade Hops using PacBio long reads and the diploid genome assembler, FALCON-Unzip2. The assembly has dramatically improved contiguity and completeness over earlier short-read assemblies. The genome is primarily assembled as haplotypes due to the outbred nature of the organism. We explore patterns of haplotype divergence across the assembly and present strategies to deduplicate haplotypes prior to scaffolding


June 1, 2021  |  

Best practices for whole genome sequencing using the Sequel System

Plant and animal whole genome sequencing has proven to be challenging, particularly due to genome size, high density of repetitive elements and heterozygosity. The Sequel System delivers long reads, high consensus accuracy and uniform coverage, enabling more complete, accurate, and contiguous assemblies of these large complex genomes. The latest Sequel chemistry increases yield up to 8 Gb per SMRT Cell for long insert libraries >20 kb and up to 10 Gb per SMRT Cell for libraries >40 kb. In addition, the recently released SMRTbell Express Template Prep Kit reduces the time (~3 hours) and DNA input (~3 µg), making the workflow easy to use for multi- SMRT Cell projects. Here, we recommend the best practices for whole genome sequencing and de novo assembly of complex plant and animal genomes. Guidelines for constructing large-insert SMRTbell libraries (>30 kb) to generate optimal read lengths and yields using the latest Sequel chemistry are presented. We also describe ways to maximize library yield per preparation from as littles as 3 µg of sheared genomic DNA. The combination of these advances makes plant and animal whole genome sequencing a practical application of the Sequel System.


April 21, 2020  |  

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.

The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.


April 21, 2020  |  

The bracteatus pineapple genome and domestication of clonally propagated crops.

Domestication of clonally propagated crops such as pineapple from South America was hypothesized to be a ‘one-step operation’. We sequenced the genome of Ananas comosus var. bracteatus CB5 and assembled 513?Mb into 25 chromosomes with 29,412 genes. Comparison of the genomes of CB5, F153 and MD2 elucidated the genomic basis of fiber production, color formation, sugar accumulation and fruit maturation. We also resequenced 89 Ananas genomes. Cultivars ‘Smooth Cayenne’ and ‘Queen’ exhibited ancient and recent admixture, while ‘Singapore Spanish’ supported a one-step operation of domestication. We identified 25 selective sweeps, including a strong sweep containing a pair of tandemly duplicated bromelain inhibitors. Four candidate genes for self-incompatibility were linked in F153, but were not functional in self-compatible CB5. Our findings support the coexistence of sexual recombination and a one-step operation in the domestication of clonally propagated crops. This work guides the exploration of sexual and asexual domestication trajectories in other clonally propagated crops.


April 21, 2020  |  

Complete genome sequence of Paenisporosarcina antarctica CGMCC 1.6503 T, a marine psychrophilic bacterium isolated from Antarctica

A marine psychrophilic bacterium _Paenisporosarcina antarctica_ CGMCC 1.6503T (= JCM 14646T) was isolated off King George Island, Antarctica (62°13’31? S 58°57’08? W). In this study, we report the complete genome sequence of _Paenisporosarcina antarctica_, which is comprised of 3,972,524?bp with a mean G?+?C content of 37.0%. By gene function and metabolic pathway analyses, studies showed that strain CGMCC 1.6503T encodes a series of genes related to cold adaptation, including encoding fatty acid desaturases, dioxygenases, antifreeze proteins and cold shock proteins, and possesses several two-component regulatory systems, which could assist this strain in responding to the cold stress, the oxygen stress and the osmotic stress in Antarctica. The complete genome sequence of _P. antarctica_ may provide further insights into the genetic mechanism of cold adaptation for Antarctic marine bacteria.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.