At DuPont Pioneer, DNA sequencing is paramount for R&D to reveal the genetic basis for traits of interest in commercial crops such as maize, soybean, sorghum, sunflower, alfalfa, canola, wheat, rice, and others. They cannot afford to wait the years it has historically taken for high-quality reference genomes to be produced. Nor can they rely on a single reference to represent the genetic diversity in its germplasm.
Single Molecule, Real-Time (SMRT) Sequencing provides efficient, streamlined solutions to address new frontiers in plant genomes and transcriptomes. Inherent challenges presented by highly repetitive, low-complexity regions and duplication events are directly addressed with multi- kilobase read lengths exceeding 8.5 kb on average, with many exceeding 20 kb. Differentiating between transcript isoforms that are difficult to resolve with short-read technologies is also now possible. We present solutions available for both reference genome and transcriptome research that best leverage long reads in several plant projects including algae, Arabidopsis, rice, and spinach using only the PacBio platform. Benefits for these applications are further realized with consistent use of size-selection of input sample using the BluePippin™ device from Sage Science. We will share highlights from our genome projects using the latest P5- C3 chemistry to generate high-quality reference genomes with the highest contiguity, contig N50 exceeding 1 Mb, and average base quality of QV50. Additionally, the value of long, intact reads to provide a no-assembly approach to investigate transcript isoforms using our Iso-Seq protocol will be presented for full transcriptome characterization and targeted surveys of genes with complex structures. PacBio provides the most comprehensive assembly with annotation when combining offerings for both genome and transcriptome research efforts. For more focused investigation, PacBio also offers researchers opportunities to easily investigate and survey genes with complex structures.
Numerous whole genome sequencing projects already achieved or ongoing have highlighted the fact that obtaining a high quality genome sequence is necessary to address comparative genomics questions such as structural variations among genotypes and gain or loss of specific function. Despite the spectacular progress that has been done regarding sequencing technologies, accurate and reliable data are still challenging, at the whole genome scale but also when targeting specific genomic regions. These issues are even more noticeable for complex plant genomes. Most plant genomes are known to be particularly challenging due to their size, high density of repetitive elements and various levels of ploidy. To overcome these issues, we have developed a strategy in order to reduce the genome complexity by using the large insert BAC libraries combined with next generation sequencing technologies. We have compared two different technologies (Roche-454 and Pacific Biosciences PacBio RS II) to sequence pools of BAC clones in order to obtain the best quality sequence. We targeted nine BAC clones from different species (maize, wheat, strawberry, barley, sugarcane and sunflower) known to be complex in terms of sequence assembly. We sequenced the pools of the nine BAC clones with both technologies. We have compared results of assembly and highlighted differences due to the sequencing technologies used. We demonstrated that the long reads obtained with the PacBio RS II technology enables to obtain a better and more reliable assembly notably by preventing errors due to duplicated or repetitive sequences in the same region.
Reconstruction of the spinach coding genome using full-length transcriptome without a reference genome
For highly complex and large genomes, a well-annotated genome may be computationally challenging and costly, yet the study of alternative splicing events and gene annotations usually rely on the existence of a genome. Long-read sequencing technology provides new opportunities to sequence full-length cDNAs, avoiding computational challenges that short read transcript assembly brings. The use of single molecule, real-time sequencing from PacBio to sequence transcriptomes (the Iso-Seq method), which produces de novo, high-quality, full-length transcripts, has revealed an astonishing amount of alternative splicing in eukaryotic species. With the Iso-Seq method, it is now possible to reconstruct the transcribed regions of the genome using just the transcripts themselves. We present Cogent, a tool for finding gene families and reconstructing the coding genome in the absence of a high-quality reference genome. Cogent uses k-mer similarities to first partition the transcripts into different gene families. Then, for each gene family, the transcripts are used to build a splice graph. Cogent identifies bubbles resulting from sequencing errors, minor variants, and exon skipping events, and attempts to resolve each splice graph down to the minimal set of reconstructed contigs. We apply Cogent to the Iso-Seq data for spinach, Spinacia oleracea, for which there is also a PacBio-based draft genome to validate the reconstruction. The Iso-Seq dataset consists of 68,263 fulllength, Quiver-polished transcript sequences ranging from 528 bp to 6 kbp long (mean: 2.1 kbp). Using the genome mapping as ground truth, we found that 95% (8045/8446) of the Cogent gene families found corresponded to a single genomic loci. For families that contained multiple loci, they were often homologous genes that would be categorized as belonging to the same gene family. Coding genome reconstruction was then performed individually for each gene family. A total of 86% (7283/8446) of the gene families were resolved to a single contig by Cogent, and was validated to be also a single contig in the genome. In 59 cases, Cogent reconstructed a single contig, however the contig corresponded to 2 or more loci in the genome, suggesting possible scaffolding opportunities. In 24 cases, the transcripts had no hits to the genome, though Pfam and BLAST searches of the transcripts show that they were indeed coding, suggesting that the genome is missing certain coding portions. Given the high quality of the spinach genome, we were not surprised to find that Cogent only minorly improved the genome space. However the ability of Cogent to accurately identify gene families and reconstruct the coding genome in a de novo fashion shows that it will be extremely powerful when applied to datasets for which there is no or low-quality reference genome.
Maize is an amazingly diverse crop. A study in 20051 demonstrated that half of the genome sequence and one-third of the gene content between two inbred lines of maize were not shared. This diversity, which is more than two orders of magnitude larger than the diversity found between humans and chimpanzees, highlights the inability of a single reference genome to represent the full pan-genome of maize and all its variants. Here we present and review several efforts to characterize the complete diversity within maize using the highly accurate long reads of PacBio Single Molecule, Real-Time (SMRT) Sequencing. These methods provide a framework for a pan-genomic approach that can be applied to studies of a wide variety of important crop species.
Structural variants (genomic differences =50 base pairs) contribute to the evolution of traits and disease. Most structural variants (SVs) are too small to detect with array comparative genomic hybridization and too large to reliably discover with short-read DNA sequencing.
By 2050, there will be 9 billion people on the planet. What will they eat? This is the question that led Rod Wing, Director of the Arizona Genomics Institute, into…
PAG PacBio Workshop: Introducing 5 new high-quality PacBio genome assemblies for rice to help solve the 10-billion people question
At PAG 2017, Rod Wing presented five new, high-quality rice genome assemblies developed with SMRT Sequencing, including one that has eight complete chromosomes including centromeres. He also offered an early…
In a poster presented at AGBT 2017, Fritz Sedlazeck from Johns Hopkins University describes the comparison of genome assemblies produced using long-read PacBio sequencing and short-read sequencing with 10x Genomics…
Genes are the future of coffee. Not nitro cold brewing or beans pooped out by civets, but genes. And coffee’s gene-fueled future just drew nearer, now that scientists have sequenced…
Brassica napus (AACC, 2n = 38) is an important oilseed crop grown worldwide. However, little is known about the population evolution of this species, the genomic difference between its major genetic groups, such as European and Asian rapeseed, and the impacts of historical large-scale introgression events on this young tetraploid. In this study, we reported the de novo assembly of the genome sequences of an Asian rapeseed (B. napus), Ningyou 7, and its four progenitors and compared these genomes with other available genomic data from diverse European and Asian cultivars. Our results showed that Asian rapeseed originally derived from European rapeseed but subsequently significantly diverged, with rapid genome differentiation after hybridization and intensive local selective breeding. The first historical introgression of B. rapa dramatically broadened the allelic pool but decreased the deleterious variations of Asian rapeseed. The second historical introgression of the double-low traits of European rapeseed (canola) has reshaped Asian rapeseed into two groups (double-low and double-high), accompanied by an increase in genetic load in the double-low group. This study demonstrates distinctive genomic footprints and deleterious SNP (single nucleotide polymorphism) variants for local adaptation by recent intra- and interspecies introgression events and provides novel insights for understanding the rapid genome evolution of a young allopolyploid crop. © 2019 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
The complete chloroplast genome sequence of watercress (Nasturtium officinale R. Br.): Genome organization, adaptive evolution and phylogenetic relationships in Cardamineae.
Watercress (Nasturtium officinale R. Br.), an aquatic leafy vegetable of the Brassicaceae family, is known as a nutritional powerhouse. Here, we de novo sequenced and assembled the complete chloroplast (cp) genome of watercress based on combined PacBio and Illumina data. The cp genome is 155,106?bp in length, exhibiting a typical quadripartite structure including a pair of inverted repeats (IRA and IRB) of 26,505?bp separated by a large single copy (LSC) region of 84,265?bp and a small single copy (SSC) region of 17,831?bp. The genome contained 113 unique genes, including 79 protein-coding genes, 30 tRNAs and 4 rRNAs, with 20 duplicate in the IRs. Compared with the prior cp genome of watercress deposited in GenBank, 21 single nucleotide polymorphisms (SNPs) and 27 indels were identified, mainly located in noncoding sequences. A total of 49 repeat structures and 71 simple sequence repeats (SSRs) were detected. Codon usage showed a bias for A/T-ending codons in the cp genome of watercress. Moreover, 45 RNA editing sites were predicted in 16 genes, all for C-to-U transitions. A comparative plastome study with Cardamineae species revealed a conserved gene order and high similarity of protein-coding sequences. Analysis of the Ka/Ks ratios of Cardamineae suggested positive selection exerted on the ycf2 gene in watercress, which might reflect specific adaptations of watercress to its particular living environment. Phylogenetic analyses based on complete cp genomes and common protein-coding genes from 56 species showed that the genus Nasturtium was a sister to Cardamine in the Cardamineae tribe. Our study provides valuable resources for future evolution, population genetics and molecular biology studies of watercress. Copyright © 2019 Elsevier B.V. All rights reserved.
Gene targeting by the TAL effector PthXo2 reveals cryptic resistance gene for bacterial blight of rice.
Bacterial blight of rice is caused by the ?-proteobacterium Xanthomonas oryzae pv. oryzae, which utilizes a group of type III TAL (transcription activator-like) effectors to induce host gene expression and condition host susceptibility. Five SWEET genes are functionally redundant to support bacterial disease, but only two were experimentally proven targets of natural TAL effectors. Here, we report the identification of the sucrose transporter gene OsSWEET13 as the disease-susceptibility gene for PthXo2 and the existence of cryptic recessive resistance to PthXo2-dependent X. oryzae pv. oryzae due to promoter variations of OsSWEET13 in japonica rice. PthXo2-containing strains induce OsSWEET13 in indica rice IR24 due to the presence of an unpredicted and undescribed effector binding site not present in the alleles in japonica rice Nipponbare and Kitaake. The specificity of effector-associated gene induction and disease susceptibility is attributable to a single nucleotide polymorphism (SNP), which is also found in a polymorphic allele of OsSWEET13 known as the recessive resistance gene xa25 from the rice cultivar Minghui 63. The mutation of OsSWEET13 with CRISPR/Cas9 technology further corroborates the requirement of OsSWEET13 expression for the state of PthXo2-dependent disease susceptibility to X. oryzae pv. oryzae. Gene profiling of a collection of 104 strains revealed OsSWEET13 induction by 42 isolates of X. oryzae pv. oryzae. Heterologous expression of OsSWEET13 in Nicotiana benthamiana leaf cells elevates sucrose concentrations in the apoplasm. The results corroborate a model whereby X. oryzae pv. oryzae enhances the release of sucrose from host cells in order to exploit the host resources.© 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.
Is there foul play in the leaf pocket? The metagenome of floating fern Azolla reveals endophytes that do not fix N2 but may denitrify.
Dinitrogen fixation by Nostoc azollae residing in specialized leaf pockets supports prolific growth of the floating fern Azolla filiculoides. To evaluate contributions by further microorganisms, the A. filiculoides microbiome and nitrogen metabolism in bacteria persistently associated with Azolla ferns were characterized. A metagenomic approach was taken complemented by detection of N2 O released and nitrogen isotope determinations of fern biomass. Ribosomal RNA genes in sequenced DNA of natural ferns, their enriched leaf pockets and water filtrate from the surrounding ditch established that bacteria of A. filiculoides differed entirely from surrounding water and revealed species of the order Rhizobiales. Analyses of seven cultivated Azolla species confirmed persistent association with Rhizobiales. Two distinct nearly full-length Rhizobiales genomes were identified in leaf-pocket-enriched samples from ditch grown A. filiculoides. Their annotation revealed genes for denitrification but not N2 -fixation. 15 N2 incorporation was active in ferns with N. azollae but not in ferns without. N2 O was not detectably released from surface-sterilized ferns with the Rhizobiales. N2 -fixing N. azollae, we conclude, dominated the microbiome of Azolla ferns. The persistent but less abundant heterotrophic Rhizobiales bacteria possibly contributed to lowering O2 levels in leaf pockets but did not release detectable amounts of the strong greenhouse gas N2 O.© 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
A near complete snapshot of the Zea mays seedling transcriptome revealed from ultra-deep sequencing.
RNA-sequencing (RNA-seq) enables in-depth exploration of transcriptomes, but typical sequencing depth often limits its comprehensiveness. In this study, we generated nearly 3 billion RNA-Seq reads, totaling 341 Gb of sequence, from a Zea mays seedling sample. At this depth, a near complete snapshot of the transcriptome was observed consisting of over 90% of the annotated transcripts, including lowly expressed transcription factors. A novel hybrid strategy combining de novo and reference-based assemblies yielded a transcriptome consisting of 126,708 transcripts with 88% of expressed known genes assembled to full-length. We improved current annotations by adding 4,842 previously unannotated transcript variants and many new features, including 212 maize transcripts, 201 genes, 10 genes with undocumented potential roles in seedlings as well as maize lineage specific gene fusion events. We demonstrated the power of deep sequencing for large transcriptome studies by generating a high quality transcriptome, which provides a rich resource for the research community.