The availability of plant reference genomes has ushered in a new era of crop genomics. More than 100 plant genomes have been sequenced since 2000, 63% of which are crop species. These genome sequences provide insight into architecture, evolution and novel aspects of crop genomes such as the retention of key agronomic traits after whole genome duplication events. Some crops have very large, polyploid, repeat-rich genomes, which require innovative strategies for sequencing, assembly and analysis. Even low quality reference genomes have the potential to improve crop germplasm through genome-wide molecular markers, which decrease expensive phenotyping and breeding cycles. The next stage of plant genomics will require draft genome refinement, building resources for crop wild relatives, resequencing broad diversity panels, and plant ENCODE projects to better understand the complexities of these highly diverse genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Pineapple (Ananas comosus (L.) Merr.) is the most economically valuable crop possessing crassulacean acid metabolism (CAM), a photosynthetic carbon assimilation pathway with high water-use efficiency, and the second most important tropical fruit. We sequenced the genomes of pineapple varieties F153 and MD2 and a wild pineapple relative, Ananas bracteatus accession CB5. The pineapple genome has one fewer ancient whole-genome duplication event than sequenced grass genomes and a conserved karyotype with seven chromosomes from before the ? duplication event. The pineapple lineage has transitioned from C3 photosynthesis to CAM, with CAM-related genes exhibiting a diel expression pattern in photosynthetic tissues. CAM pathway genes were enriched with cis-regulatory elements associated with the regulation of circadian clock genes, providing the first cis-regulatory link between CAM and circadian clock regulation. Pineapple CAM photosynthesis evolved by the reconfiguration of pathways in C3 plants, through the regulatory neofunctionalization of preexisting genes and not through the acquisition of neofunctionalized genes via whole-genome or tandem gene duplication.
The power of Single Molecule Real-Time sequencing technology in the de novo assembly of a eukaryotic genome.
Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.
AnnoTALE: bioinformatics tools for identification, annotation, and nomenclature of TALEs from Xanthomonas genomic sequences.
Transcription activator-like effectors (TALEs) are virulence factors, produced by the bacterial plant-pathogen Xanthomonas, that function as gene activators inside plant cells. Although the contribution of individual TALEs to infectivity has been shown, the specific roles of most TALEs, and the overall TALE diversity in Xanthomonas spp. is not known. TALEs possess a highly repetitive DNA-binding domain, which is notoriously difficult to sequence. Here, we describe an improved method for characterizing TALE genes by the use of PacBio sequencing. We present ‘AnnoTALE’, a suite of applications for the analysis and annotation of TALE genes from Xanthomonas genomes, and for grouping similar TALEs into classes. Based on these classes, we propose a unified nomenclature for Xanthomonas TALEs that reveals similarities pointing to related functionalities. This new classification enables us to compare related TALEs and to identify base substitutions responsible for the evolution of TALE specificities.
Condition-dependent co-regulation of genomic clusters of virulence factors in the grapevine trunk pathogen Neofusicoccum parvum.
The ascomycete Neofusicoccum parvum, one of the causal agents of Botryosphaeria dieback, is a destructive wood-infecting fungus and a serious threat to grape production worldwide. The capability to colonize woody tissue, combined with the secretion of phytotoxic compounds, is thought to underlie its pathogenicity and virulence. Here, we describe the repertoire of virulence factors and their transcriptional dynamics as the fungus feeds on different substrates and colonizes the woody stem. We assembled and annotated a highly contiguous genome using single-molecule real-time DNA sequencing. Transcriptome profiling by RNA sequencing determined the genome-wide patterns of expression of virulence factors both in vitro (potato dextrose agar or medium amended with grape wood as substrate) and in planta. Pairwise statistical testing of differential expression, followed by co-expression network analysis, revealed that physically clustered genes coding for putative virulence functions were induced depending on the substrate or stage of plant infection. Co-expressed gene clusters were significantly enriched not only in genes associated with secondary metabolism, but also in those associated with cell wall degradation, suggesting that dynamic co-regulation of transcriptional networks contributes to multiple aspects of N. parvum virulence. In most of the co-expressed clusters, all genes shared at least a common motif in their promoter region, indicative of co-regulation by the same transcription factor. Co-expression analysis also identified chromatin regulators with correlated expression with inducible clusters of virulence factors, suggesting a complex, multi-layered regulation of the virulence repertoire of N. parvum.© 2016 BSPP AND JOHN WILEY & SONS LTD.
How Single Molecule Real-Time Sequencing and haplotype phasing have enabled reference-grade diploid genome assembly of wine grapes.
Domesticated grapevines (Vitis vinifera) have relatively small genomes of about 500 Mb (Lodhi and Reisch, 1995; Jaillon et al., 2007; Velasco et al., 2007), which is similar to other small-genomes species like rice (430 Mb; Goff et al., 2002), medicago (500 Mb; Tang et al., 2014), and poplar (465 Mb; Tuskan et al., 2006). Despite their small genome size, the sequencing and assembling of grapevine genomes is difficult because of high levels of heterozygosity. The high heterozygosity in domesticated grapes may be due, in part, to their domestication from an obligately outcrossing, dioecious wild progenitor. Domesticated grapes can be selfed, in theory, because their mating system transitioned to hermaphroditic, self-fertile flowers during domestication. In practice, however, selfed progeny tend to be non-viable, presumably due to a high deleterious recessive load and resulting inbreeding depression. As a consequence of these fitness effects, most grape cultivars are crosses between distantly related parents (Strefeler et al., 1992; Ohmi et al., 1993; Bowers and Meredith, 1997; Sefc et al., 1998; Lopes et al., 1999; Di Gaspero et al., 2005; Tapia et al., 2007; Ibáñez et al., 2009; Cipriani et al., 2010; Myles et al., 2011; Lacombe et al., 2013).
Construction of a reference genetic map of Raphanus sativus based on genotyping by whole-genome resequencing.
This manuscript provides a genetic map of Raphanus sativus that has been used as a reference genetic map for an ongoing genome sequencing project. The map was constructed based on genotyping by whole-genome resequencing of mapping parents and F 2 population. Raphanus sativus is an annual vegetable crop species of the Brassicaceae family and is one of the key plants in the seed industry, especially in East Asia. Assessment of the R. sativus genome provides fundamental resources for crop improvement as well as the study of crop genome structure and evolution. With the goal of anchoring genome sequence assemblies of R. sativus cv. WK10039 whose genome has been sequenced onto the chromosomes, we developed a reference genetic map based on genotyping of two parents (maternal WK10039 and paternal WK10024) and 93 individuals of the F2 mapping population by whole-genome resequencing. To develop high-confidence genetic markers, ~83 Gb of parental lines and ~591 Gb of mapping population data were generated as Illumina 100 bp paired-end reads. High stringent sequence analysis of the reads mapped to the 344 Mb of genome sequence scaffolds identified a total of 16,282 SNPs and 150 PCR-based markers. Using a subset of the markers, a high-density genetic map was constructed from the analysis of 2,637 markers spanning 1,538 cM with 1,000 unique framework loci. The genetic markers integrated 295 Mb of genome sequences to the cytogenetically defined chromosome arms. Comparative analysis of the chromosome-anchored sequences with Arabidopsis thaliana and Brassica rapa revealed that the R. sativus genome has evident triplicated sub-genome blocks and the structure of gene space is highly similar to that of B. rapa. The genetic map developed in this study will serve as fundamental genomic resources for the study of R. sativus.
Seaweeds are essential for marine ecosystems and have immense economic value. Here we present a comprehensive analysis of the draft genome of Saccharina japonica, one of the most economically important seaweeds. The 537-Mb assembled genomic sequence covered 98.5% of the estimated genome, and 18,733 protein-coding genes are predicted and annotated. Gene families related to cell wall synthesis, halogen concentration, development and defence systems were expanded. Functional diversification of the mannuronan C-5-epimerase and haloperoxidase gene families provides insight into the evolutionary adaptation of polysaccharide biosynthesis and iodine antioxidation. Additional sequencing of seven cultivars and nine wild individuals reveal that the genetic diversity within wild populations is greater than among cultivars. All of the cultivars are descendants of a wild S. japonica accession showing limited admixture with S. longissima. This study represents an important advance toward improving yields and economic traits in Saccharina and provides an invaluable resource for plant genome studies.
Genomes of ‘Candidatus Liberibacter solanacearum’ Haplotype A from New Zealand and the United States Suggest Significant Genome Plasticity in the Species.
‘Candidatus Liberibacter solanacearum’ contains two solanaceous crop-infecting haplotypes, A and B. Two haplotype A draft genomes were assembled and compared with ZC1 (haplotype B), revealing inversion and relocation genomic rearrangements, numerous single-nucleotide polymorphisms, and differences in phage-related regions. Differences in prophage location and sequence were seen both within and between haplotype comparisons. OrthoMCL and BLAST analyses identified 46 putative coding sequences present in haplotype A that were not present in haplotype B. Thirty-eight of these loci were not found in sequences from other Liberibacter spp. Quantitative polymerase chain reaction (qPCR) assays designed to amplify sequences from 15 of these loci were screened against a panel of ‘Ca. L. solanacearum’-positive samples to investigate genetic diversity. Seven of the assays demonstrated within-haplotype diversity; five failed to amplify loci in at least one haplotype A sample while three assays produced amplicons from some haplotype B samples. Eight of the loci assays showed consistent A-B differentiation. Differences in genome arrangements, prophage, and qPCR results suggesting locus diversity within the haplotypes provide more evidence for genetic complexity in this emerging bacterial species.
Here we report the draft genome sequence of perennial ryegrass (Lolium perenne), an economically important forage and turf grass species that is widely cultivated in temperate regions worldwide. It is classified along with wheat, barley, oats and Brachypodium distachyon in the Pooideae sub-family of the grass family (Poaceae). Transcriptome data was used to identify 28 455 gene models, and we utilized macro-co-linearity between perennial ryegrass and barley, and synteny within the grass family, to establish a synteny-based linear gene order. The gametophytic self-incompatibility mechanism enables the pistil of a plant to reject self-pollen and therefore promote out-crossing. We have used the sequence assembly to characterize transcriptional changes in the stigma during pollination with both compatible and incompatible pollen. Characterization of the pollen transcriptome identified homologs to pollen allergens from a range of species, many of which were expressed to very high levels in mature pollen grains, and are potentially involved in the self-incompatibility mechanism. The genome sequence provides a valuable resource for future breeding efforts based on genomic prediction, and will accelerate the development of new varieties for more productive grasslands.© 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.
Single molecule sequencing of THCA synthase reveals copy number variation in modern drug-type Cannabis sativa L.
Cannabinoid expression is an important genetically determined feature of cannabis that presents clinical and legal implications for patients seeking cannabinoid specific therapies like Cannabidiol (CBD). Cannabinoid, terpenoid, and flavonoid marker assisted selection can accelerate breeding efforts by offering genetic tools to select for desired traits at an early stage in growth. To this end, multiple models for chemotype inheritance have been described suggesting a complex picture for chemical phenotype determination. Here we explore the potential role of copy number variation of THCA Synthase using phased single molecule sequencing and demonstrate that copy number and sequence variation of this gene is common and suggests a more nuanced view of chemotype prediction.
Kosakonia sacchari sp. nov. is a new species within the new genus Kosakonia, which was included in the genus Enterobacter. K sacchari is a nitrogen-fixing bacterium named for its association with sugarcane (Saccharum officinarum L.). K sacchari bacteria are Gram-negative, aerobic, non-spore-forming, motile rods. Strain SP1(T) (=CGMCC1.12102(T)=LMG 26783(T)) is the type strain of the K sacchari sp. nov and is able to colonize and fix N2 in association with sugarcane plants, thus promoting plant growth. Here we summarize the features of strain SP1(T) and describe its complete genome sequence. The genome contains a single chromosome and no plasmids, 4,902,024 nucleotides with 53.7% GC content, 4,460 protein-coding genes and 105 RNA genes including 22 rRNA genes, 82 tRNA genes, and 1 ncRNA gene.
De novo assembly and characterization of the complete chloroplast genome of radish (Raphanus sativus L.).
Radish (Raphanus sativus L.) is an edible root vegetable crop that is cultivated worldwide and whose genome has been sequenced. Here we report the complete nucleotide sequence of the radish cultivar WK10039 chloroplast (cp) genome, along with a de novo assembly strategy using whole genome shotgun sequence reads obtained by next generation sequencing. The radish cp genome is 153,368 bp in length and has a typical quadripartite structure, composed of a pair of inverted repeat regions (26,217 bp each), a large single copy region (83,170 bp), and a small single copy region (17,764 bp). The radish cp genome contains 87 predicted protein-coding genes, 37 tRNA genes, and 8 rRNA genes. Sequence analysis revealed the presence of 91 simple sequence repeats (SSRs) in the radish cp genome. Phylogenetic analysis of 62 protein-coding gene sequences from the 17 cp genomes of the Brassicaceae family suggested that the radish cp genome is most closely related to the cp genomes of Brassica rapa and Brassicanapus. Comparisons with the B. rapa and B. napus cp genomes revealed highly divergent intergenic sequences and introns that can potentially be developed as diagnostic cp markers. Synonymous and nonsynonymous substitutions of cp genes suggested that nucleotide substitutions have occurred at similar rates in most genes. The complete sequence of the radish cp genome would serve as a valuable resource for the development of new molecular markers and the study of the phylogenetic relationships of Raphanus species in the Brassicaceae family. Copyright © 2014 Elsevier B.V. All rights reserved.
Pseudomonas syringae CC1557: a highly virulent strain with an unusually small type III effector repertoire that includes a novel effector.
Both type III effector proteins and nonribosomal peptide toxins play important roles for Pseudomonas syringae pathogenicity in host plants, but whether and how these pathways interact to promote infection remains unclear. Genomic evidence from one clade of P. syringae suggests a tradeoff between the total number of type III effector proteins and presence of syringomycin, syringopeptin, and syringolin A toxins. Here, we report the complete genome sequence from P. syringae CC1557, which contains the lowest number of known type III effectors to date and has also acquired genes similar to sequences encoding syringomycin pathways from other strains. We demonstrate that this strain is pathogenic on Nicotiana benthamiana and that both the type III secretion system and a new type III effector, hopBJ1, contribute to pathogenicity. We further demonstrate that activity of HopBJ1 is dependent on residues structurally similar to the catalytic site of Escherichia coli CNF1 toxin. Taken together, our results provide additional support for a negative correlation between type III effector repertoires and the potential to produce syringomycin-like toxins while also highlighting how genomic synteny and bioinformatics can be used to identify and characterize novel virulence proteins.
Rhizobium leguminosarum bv. trifolii SRDI565 (syn. N8-J) is an aerobic, motile, Gram-negative, non-spore-forming rod. SRDI565 was isolated from a nodule recovered from the roots of the annual clover Trifolium subterraneum subsp. subterraneum grown in the greenhouse and inoculated with soil collected from New South Wales, Australia. SRDI565 has a broad host range for nodulation within the clover genus, however N2-fixation is sub-optimal with some Trifolium species and ineffective with others. Here we describe the features of R. leguminosarum bv. trifolii strain SRDI565, together with genome sequence information and annotation. The 6,905,599 bp high-quality-draft genome is arranged into 7 scaffolds of 7 contigs, contains 6,750 protein-coding genes and 86 RNA-only encoding genes, and is one of 100 rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.