April 21, 2020  |  

Benchmarking Transposable Element Annotation Methods for Creation of a Streamlined, Comprehensive Pipeline

Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and allow for annotation of TEs. There are numerous methods for each class of elements with unknown relative performance metrics. We benchmarked existing programs based on a curated library of rice TEs. Using the most robust programs, we created a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a condensed TE library for annotations of structurally intact and fragmented elements. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.List of abbreviationsTETransposable ElementsLTRLong Terminal RepeatLINELong Interspersed Nuclear ElementSINEShort Interspersed Nuclear ElementMITEMiniature Inverted Transposable ElementTIRTerminal Inverted RepeatTSDTarget Site DuplicationTPTrue PositivesFPFalse PositivesTNTrue NegativeFNFalse NegativesGRFGeneric Repeat FinderEDTAExtensive de-novo TE Annotator


April 21, 2020  |  

Complete chloroplast genome sequences of Kaempferia galanga and Kaempferia elegans: Molecular structures and comparative analysis.

Kaempferia galanga and Kaempferia elegans, which belong to the genus Kaempferia family Zingiberaceae, are used as valuable herbal medicine and ornamental plants, respectively. The chloroplast genomes have been used for molecular markers, species identification and phylogenetic studies. In this study, the complete chloroplast genome sequences of K. galanga and K. elegans are reported. Results show that the complete chloroplast genome of K. galanga is 163,811 bp long, having a quadripartite structure with large single copy (LSC) of 88,405 bp and a small single copy (SSC) of 15,812 bp separated by inverted repeats (IRs) of 29,797 bp. Similarly, the complete chloroplast genome of K. elegans is 163,555 bp long, having a quadripartite structure in which IRs of 29,773 bp length separates 88,020 bp of LSC and 15,989 bp of SSC. A total of 111 genes in K. galanga and 113 genes in K. elegans comprised 79 protein-coding genes and 4 ribosomal RNA (rRNA) genes, as well as 28 and 30 transfer RNA (tRNA) genes in K. galanga and K. elegans, respectively. The gene order, GC content and orientation of the two Kaempferia chloroplast genomes exhibited high similarity. The location and distribution of simple sequence repeats (SSRs) and long repeat sequences were determined. Eight highly variable regions between the two Kaempferia species were identified and 643 mutation events, including 536 single-nucleotide polymorphisms (SNPs) and 107 insertion/deletions (indels), were accurately located. Sequence divergences of the whole chloroplast genomes were calculated among related Zingiberaceae species. The phylogenetic analysis based on SNPs among eleven species strongly supported that K. galanga and K. elegans formed a cluster within Zingiberaceae. This study identified the unique characteristics of the entire K. galanga and K. elegans chloroplast genomes that contribute to our understanding of the chloroplast DNA evolution within Zingiberaceae species. It provides valuable information for phylogenetic analysis and species identification within genus Kaempferia.


April 21, 2020  |  

The complexity of alternative splicing and landscape of tissue-specific expression in lotus (Nelumbo nucifera) unveiled by Illumina- and single-molecule real-time-based RNA-sequencing.

Alternative splicing (AS) plays a critical role in regulating different physiological and developmental processes in eukaryotes, by dramatically increasing the diversity of the transcriptome and the proteome. However, the saturation and complexity of AS remain unclear in lotus due to its limitation of rare obtainment of full-length multiple-splice isoforms. In this study, we apply a hybrid assembly strategy by combining single-molecule real-time sequencing and Illumina RNA-seq to get a comprehensive insight into the lotus transcriptomic landscape. We identified 211,802 high-quality full-length non-chimeric reads, with 192,690 non-redundant isoforms, and updated the lotus reference gene model. Moreover, our analysis identified a total of 104,288 AS events from 16,543 genes, with alternative 3′ splice-site being the predominant model, following by intron retention. By exploring tissue datasets, 370 tissue-specific AS events were identified among 12 tissues. Both the tissue-specific genes and isoforms might play important roles in tissue or organ development, and are suitable for ‘ABCE’ model partly in floral tissues. A large number of AS events and isoform variants identified in our study enhance the understanding of transcriptional diversity in lotus, and provide valuable resource for further functional genomic studies. © The Author(s) 2019. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


April 21, 2020  |  

De Novo Genome Sequence Assembly of Dwarf Coconut (Cocos nucifera L. ‘Catigan Green Dwarf’) Provides Insights into Genomic Variation Between Coconut Types and Related Palm Species.

We report the first whole genome sequence (WGS) assembly and annotation of a dwarf coconut variety, ‘Catigan Green Dwarf’ (CATD). The genome sequence was generated using the PacBio SMRT sequencing platform at 15X coverage of the expected genome size of 2.15 Gbp, which was corrected with assembled 50X Illumina paired-end MiSeq reads of the same genome. The draft genome was improved through Chicago sequencing to generate a scaffold assembly that results in a total genome size of 2.1 Gbp consisting of 7,998 scaffolds with N50 of 570,487 bp. The final assembly covers around 97.6% of the estimated genome size of coconut ‘CATD’ based on homozygous k-mer peak analysis. A total of 34,958 high-confidence gene models were predicted and functionally associated to various economically important traits, such as pest/disease resistance, drought tolerance, coconut oil biosynthesis, and putative transcription factors. The assembled genome was used to infer the evolutionary relationship within the palm family based on genomic variations and synteny of coding gene sequences. Data show that at least three (3) rounds of whole genome duplication occurred and are commonly shared by these members of the Arecaceae family. A total of 7,139 unique SSR markers were designed to be used as a resource in marker-based breeding. In addition, we discovered 58,503 variants in coconut by aligning the Hainan Tall (HAT) WGS reads to the non-repetitive regions of the assembled CATD genome. The gene markers and genome-wide SSR markers established here will facilitate the development of varieties with resilience to climate change, resistance to pests and diseases, and improved oil yield and quality.Copyright © 2019 Lantican et al.


April 21, 2020  |  

A reference genome for pea provides insight into legume genome evolution.

We report the first annotated chromosome-level reference genome assembly for pea, Gregor Mendel’s original genetic model. Phylogenetics and paleogenomics show genomic rearrangements across legumes and suggest a major role for repetitive elements in pea genome evolution. Compared to other sequenced Leguminosae genomes, the pea genome shows intense gene dynamics, most likely associated with genome size expansion when the Fabeae diverged from its sister tribes. During Pisum evolution, translocation and transposition differentially occurred across lineages. This reference sequence will accelerate our understanding of the molecular basis of agronomically important traits and support crop improvement.


April 21, 2020  |  

A global survey of full-length transcriptome of Ginkgo biloba reveals transcript variants involved in flavonoid biosynthesis

Ginkgo biloba, which contains flavonoids as bioactive components, is widely used in traditional Chinese medicine. Increasing the flavonoid production of medicinal plants through genetic engineering generally focuses on the key genes involved in flavonoid biosynthesis. However, the molecular mechanisms underlying such biosynthesis are not yet well understood. To understand these mechanisms, a combination of second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing was applied to G. biloba. Eight tissues were sampled for SMRT sequencing to generate a high-quality, full-length transcriptome database. From 23.36 Gb clean reads, 12,954 alternative polyadenylation events, 12,290 alternative splicing events, 929 fusion transcripts, 2,286 novel transcripts, and 1,270 lncRNAs were predicted by removing redundant reads. Further studies reveal that 7 AS, 5 lncRNA, and 6 fusion gene events were identified in flavonoid biosynthesis. A total of 12 gene modules were revealed to be involved in flavonoid metabolism structural genes and transcription factors by constructing co-expression networks. Weighted gene coexpression network analysis (WGCNA) analysis reveals that some hub genes operate during the biosynthesis by identifying transcription factors (TFs) and structure genes. Seven key hub genes were also identified by analyzing the correlation between gene expression level and flavonoids content. The results highlight the importance of SMRT sequencing of the full-length transcriptome in improving genome annotation and elucidating the gene regulation of flavonoid biosynthesis in G. biloba by providing a comprehensive set of reference transcripts.


April 21, 2020  |  

A draft genome for Spatholobus suberectus.

Spatholobus suberectus Dunn (S. suberectus), which belongs to the Leguminosae, is an important medicinal plant in China. Owing to its long growth cycle and increased use in human medicine, wild resources of S. suberectus have decreased rapidly and may be on the verge of extinction. De novo assembly of the whole S. suberectus genome provides us a critical potential resource towards biosynthesis of the main bioactive components and seed development regulation mechanism of this plant. Utilizing several sequencing technologies such as Illumina HiSeq X Ten, single-molecule real-time sequencing, 10x Genomics, as well as new assembly techniques such as FALCON and chromatin interaction mapping (Hi-C), we assembled a chromosome-scale genome about 798?Mb in size. In total, 748?Mb (93.73%) of the contig sequences were anchored onto nine chromosomes with the longest scaffold being 103.57?Mb. Further annotation analyses predicted 31,634 protein-coding genes, of which 93.9% have been functionally annotated. All data generated in this study is available in public databases.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.