June 1, 2021  |  

SMRT Sequencing solutions for plant genomes and transcriptomes

Single Molecule, Real-Time (SMRT) Sequencing provides efficient, streamlined solutions to address new frontiers in plant genomes and transcriptomes. Inherent challenges presented by highly repetitive, low-complexity regions and duplication events are directly addressed with multi- kilobase read lengths exceeding 8.5 kb on average, with many exceeding 20 kb. Differentiating between transcript isoforms that are difficult to resolve with short-read technologies is also now possible. We present solutions available for both reference genome and transcriptome research that best leverage long reads in several plant projects including algae, Arabidopsis, rice, and spinach using only the PacBio platform. Benefits for these applications are further realized with consistent use of size-selection of input sample using the BluePippin™ device from Sage Science. We will share highlights from our genome projects using the latest P5- C3 chemistry to generate high-quality reference genomes with the highest contiguity, contig N50 exceeding 1 Mb, and average base quality of QV50. Additionally, the value of long, intact reads to provide a no-assembly approach to investigate transcript isoforms using our Iso-Seq protocol will be presented for full transcriptome characterization and targeted surveys of genes with complex structures. PacBio provides the most comprehensive assembly with annotation when combining offerings for both genome and transcriptome research efforts. For more focused investigation, PacBio also offers researchers opportunities to easily investigate and survey genes with complex structures.


June 1, 2021  |  

Old school/new school genome sequencing: One step backward — a quantum leap forward.

As the costs for genome sequencing have decreased the number of “genome” sequences have increased at a rapid pace. Unfortunately, the quality and completeness of these so–called “genome” sequences have suffered enormously. We prefer to call such genome assemblies as “gene assembly space” (GAS). We believe it is important to distinguish GAS assemblies from reference genome assemblies (RGAs) as all subsequent research that depends on accurate genome assemblies can be highly compromised if the only assembly available is a GAS assembly.


June 1, 2021  |  

De novo assembly of a complex panicoid grass genome using ultra-long PacBio reads with P6C4 chemistry

Drought is responsible for much of the global losses in crop yields and understanding how plants naturally cope with drought stress is essential for breeding and engineering crops for the changing climate. Resurrection plants desiccate to complete dryness during times of drought, then “come back to life” once water is available making them an excellent model for studying drought tolerance. Understanding the molecular networks governing how resurrection plants handle desiccation will provide targets for crop engineering. Oropetium thomaeum (Oro) is a resurrection plant that also has the smallest known grass genome at 250 Mb compared to Brachypodium distachyon (300 Mb) and rice (350 Mb). Plant genomes, especially grasses, have complex repeat structures such as telomeres, centromeres, and ribosomal gene cassettes, and high heterozygosity, which makes them difficult to assembly using short read next generation sequencing technologies. Ultra-long PacBio reads using the new P6C4 chemistry and the latest 15kb Blue Pippin size-selection protocol to generate 20kb insert libraries that yielded an average read length of 12kb providing ~72X coverage, and 10X coverage with reads over 20kb. The HGAP assembly covers 98% of the genome with a contig N50 of 2.4 Mb, which makes it one of the highest quality and most complete plant genomes assembled to date. Oro has a compact genome structure compared to other grasses with only 16% repeat sequences but has very good collinearity with other grasses. Understanding the genomic mechanisms of extreme desiccation tolerance in resurrection plants like Oro will provide insights for engineering and intelligent breeding of improved food, fuel, and fiber crops.


June 1, 2021  |  

Long read sequencing technology to solve complex genomic regions assembly in plants

Numerous whole genome sequencing projects already achieved or ongoing have highlighted the fact that obtaining a high quality genome sequence is necessary to address comparative genomics questions such as structural variations among genotypes and gain or loss of specific function. Despite the spectacular progress that has been done regarding sequencing technologies, accurate and reliable data are still challenging, at the whole genome scale but also when targeting specific genomic regions. These issues are even more noticeable for complex plant genomes. Most plant genomes are known to be particularly challenging due to their size, high density of repetitive elements and various levels of ploidy. To overcome these issues, we have developed a strategy in order to reduce the genome complexity by using the large insert BAC libraries combined with next generation sequencing technologies. We have compared two different technologies (Roche-454 and Pacific Biosciences PacBio RS II) to sequence pools of BAC clones in order to obtain the best quality sequence. We targeted nine BAC clones from different species (maize, wheat, strawberry, barley, sugarcane and sunflower) known to be complex in terms of sequence assembly. We sequenced the pools of the nine BAC clones with both technologies. We have compared results of assembly and highlighted differences due to the sequencing technologies used. We demonstrated that the long reads obtained with the PacBio RS II technology enables to obtain a better and more reliable assembly notably by preventing errors due to duplicated or repetitive sequences in the same region.


June 1, 2021  |  

Applying Sequel to Genomic Datasets

De novo assembly is a large part of JGI’s analysis portfolio. Repetitive DNA sequences are abundant in a wide range of organisms we sequence and pose a significant technical challenge for assembly. We are interested in long read technologies capable of spanning genomic repeats to produce better assemblies. We currently have three RS II and two Sequel PacBio machines. RS II machines are primarily used for fungal and microbial genome assembly as well as synthetic biology validation. Between microbes and fungi we produce hundreds of PacBio libraries a year and for throughput reasons the vast majority of these are >10 kb AMPure libraries. Throughput for RS II is about 1 Gb per SMRT Cell. This is ideal for microbial sized genomes but can be costly and labor intensive for larger projects which require multiple cells. JGI was an early access site for Sequel and began testing with real samples in January 2016. During that time we’ve had the opportunity to sequence microbes, fungi, metagenomes, and plants. Here we present our experience over the last 18 months using the Sequel platform and provide comparisons with RS II results.


June 1, 2021  |  

Characterizing the pan-genome of maize with PacBio SMRT Sequencing

Maize is an amazingly diverse crop. A study in 20051 demonstrated that half of the genome sequence and one-third of the gene content between two inbred lines of maize were not shared. This diversity, which is more than two orders of magnitude larger than the diversity found between humans and chimpanzees, highlights the inability of a single reference genome to represent the full pan-genome of maize and all its variants. Here we present and review several efforts to characterize the complete diversity within maize using the highly accurate long reads of PacBio Single Molecule, Real-Time (SMRT) Sequencing. These methods provide a framework for a pan-genomic approach that can be applied to studies of a wide variety of important crop species.


June 1, 2021  |  

Structural variant detection in crops using low-fold coverage long-read sequencing

Genomics studies have shown that the insertions, deletions, duplications, translocations, inversions, and tandem repeat expansions in the structural variant (SV) size range (>50 bp) contribute to the evolution of traits and often have significant associations with agronomically important phenotypes. However, most SVs are too small to detect with array comparative genomic hybridization and too large to reliably discover with short-read DNA sequencing. While de novo assembly is the most comprehensive way to identify variants in a genome, recent studies in human genomes show that PacBio SMRT Sequencing sensitively detects structural variants at low coverage. Here we present SV characterization in the major crop species Oryza sativa subsp. indica (rice) with low-fold coverage of long reads. In addition, we provide recommendations for sequencing and analysis for the application of this workflow to other important agricultural species.


April 21, 2020  |  

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.

The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.


April 21, 2020  |  

The bracteatus pineapple genome and domestication of clonally propagated crops.

Domestication of clonally propagated crops such as pineapple from South America was hypothesized to be a ‘one-step operation’. We sequenced the genome of Ananas comosus var. bracteatus CB5 and assembled 513?Mb into 25 chromosomes with 29,412 genes. Comparison of the genomes of CB5, F153 and MD2 elucidated the genomic basis of fiber production, color formation, sugar accumulation and fruit maturation. We also resequenced 89 Ananas genomes. Cultivars ‘Smooth Cayenne’ and ‘Queen’ exhibited ancient and recent admixture, while ‘Singapore Spanish’ supported a one-step operation of domestication. We identified 25 selective sweeps, including a strong sweep containing a pair of tandemly duplicated bromelain inhibitors. Four candidate genes for self-incompatibility were linked in F153, but were not functional in self-compatible CB5. Our findings support the coexistence of sexual recombination and a one-step operation in the domestication of clonally propagated crops. This work guides the exploration of sexual and asexual domestication trajectories in other clonally propagated crops.


April 21, 2020  |  

Chlorella vulgaris genome assembly and annotation reveals the molecular basis for metabolic acclimation to high light conditions.

Chlorella vulgaris is a fast-growing fresh-water microalga cultivated at the industrial scale for applications ranging from food to biofuel production. To advance our understanding of its biology and to establish genetics tools for biotechnological manipulation, we sequenced the nuclear and organelle genomes of Chlorella vulgaris 211/11P by combining next generation sequencing and optical mapping of isolated DNA molecules. This hybrid approach allowed to assemble the nuclear genome in 14 pseudo-molecules with an N50 of 2.8 Mb and 98.9% of scaffolded genome. The integration of RNA-seq data obtained at two different irradiances of growth (high light-HL versus low light -LL) enabled to identify 10,724 nuclear genes, coding for 11,082 transcripts. Moreover 121 and 48 genes were respectively found in the chloroplast and mitochondrial genome. Functional annotation and expression analysis of nuclear, chloroplast and mitochondrial genome sequences revealed peculiar features of Chlorella vulgaris. Evidence of horizontal gene transfers from chloroplast to mitochondrial genome was observed. Furthermore, comparative transcriptomic analyses of LL vs HL provide insights into the molecular basis for metabolic rearrangement in HL vs. LL conditions leading to enhanced de novo fatty acid biosynthesis and triacylglycerol accumulation. The occurrence of a cytosolic fatty acid biosynthetic pathway can be predicted and its upregulation upon HL exposure is observed, consistent with increased lipid amount under HL. These data provide a rich genetic resource for future genome editing studies, and potential targets for biotechnological manipulation of Chlorella vulgaris or other microalgae species to improve biomass and lipid productivity.This article is protected by copyright. All rights reserved.


April 21, 2020  |  

Evolution of a 72-kb cointegrant, conjugative multiresistance plasmid from early community-associated methicillin-resistant Staphylococcus aureus isolates.

Horizontal transfer of plasmids encoding antimicrobial-resistance and virulence determinants has been instrumental in Staphylococcus aureus evolution, including the emergence of community-associated methicillin-resistant S. aureus (CA-MRSA). In the early 1990s the first CA-MRSA isolated in Western Australia (WA), WA-5, encoded cadmium, tetracycline and penicillin-resistance genes on plasmid pWBG753 (~30 kb). WA-5 and pWBG753 appeared only briefly in WA, however, fusidic-acid-resistance plasmids related to pWBG753 were also present in the first European CA-MRSA at the time. Here we characterized a 72-kb conjugative plasmid pWBG731 present in multiresistant WA-5-like clones from the same period. pWBG731 was a cointegrant formed from pWBG753 and a pWBG749-family conjugative plasmid. pWBG731 carried mupirocin, trimethoprim, cadmium and penicillin-resistance genes. The stepwise evolution of pWBG731 likely occurred through the combined actions of IS257, IS257-dependent miniature inverted-repeat transposable elements (MITEs) and the BinL resolution system of the ß-lactamase transposon Tn552 An evolutionary intermediate ~42-kb non-conjugative plasmid pWBG715, possessed the same resistance genes as pWBG731 but retained an integrated copy of the small tetracycline-resistance plasmid pT181. IS257 likely facilitated replacement of pT181 with conjugation genes on pWBG731, thus enabling autonomous transfer. Like conjugative plasmid pWBG749, pWBG731 also mobilized non-conjugative plasmids carrying oriT mimics. It seems likely that pWBG731 represents the product of multiple recombination events between the WA-5 pWBG753 plasmid and other mobile genetic elements present in indigenous CA-MSSA. The molecular evolution of pWBG731 saliently illustrates how diverse mobile genetic elements can together facilitate rapid accrual and horizontal dissemination of multiresistance in S. aureus CA-MRSA.Copyright © 2019 American Society for Microbiology.


April 21, 2020  |  

Benchmarking Transposable Element Annotation Methods for Creation of a Streamlined, Comprehensive Pipeline

Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and allow for annotation of TEs. There are numerous methods for each class of elements with unknown relative performance metrics. We benchmarked existing programs based on a curated library of rice TEs. Using the most robust programs, we created a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a condensed TE library for annotations of structurally intact and fragmented elements. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.List of abbreviationsTETransposable ElementsLTRLong Terminal RepeatLINELong Interspersed Nuclear ElementSINEShort Interspersed Nuclear ElementMITEMiniature Inverted Transposable ElementTIRTerminal Inverted RepeatTSDTarget Site DuplicationTPTrue PositivesFPFalse PositivesTNTrue NegativeFNFalse NegativesGRFGeneric Repeat FinderEDTAExtensive de-novo TE Annotator


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.