In this CSHL Biology of Genomes 2021 virtual workshop, Michelle Vierra from PacBio discusses examples of how advances in highly accurate long-read (HiFi) sequencing have enabled exciting developments in plant…
Decoding and analysis of organelle genomes of Indian tea (Camellia assamica) for phylogenetic confirmation.
The NCBI database has >15 chloroplast (cp) genome sequences available for different Camellia species but none for C. assamica. There is no report of any mitochondrial (mt) genome in the Camellia genus or Theaceae family. With the strong believes that these organelle genomes can play a great tool for taxonomic and phylogenetic analysis, we successfully assembled and analyzed cp and mt genome of C. assamica. We assembled the complete mt genome of C. assamica in a single circular contig of 707,441?bp length comprising of a total of 66 annotated genes, including 35 protein-coding genes, 29 tRNAs and two rRNAs. The first ever cp genome of C. assamica resulted in a circular contig of 157,353?bp length with a typical quadripartite structure. Phylogenetic analysis based on these organelle genomes showed that C. assamica was closely related to C. sinensis and C. leptophylla. It also supports Caryophyllales as Superasterids. Copyright © 2019. Published by Elsevier Inc.
Complete chloroplast genome sequences of Kaempferia galanga and Kaempferia elegans: Molecular structures and comparative analysis.
Kaempferia galanga and Kaempferia elegans, which belong to the genus Kaempferia family Zingiberaceae, are used as valuable herbal medicine and ornamental plants, respectively. The chloroplast genomes have been used for molecular markers, species identification and phylogenetic studies. In this study, the complete chloroplast genome sequences of K. galanga and K. elegans are reported. Results show that the complete chloroplast genome of K. galanga is 163,811 bp long, having a quadripartite structure with large single copy (LSC) of 88,405 bp and a small single copy (SSC) of 15,812 bp separated by inverted repeats (IRs) of 29,797 bp. Similarly, the complete chloroplast genome of K. elegans is 163,555 bp long, having a quadripartite structure in which IRs of 29,773 bp length separates 88,020 bp of LSC and 15,989 bp of SSC. A total of 111 genes in K. galanga and 113 genes in K. elegans comprised 79 protein-coding genes and 4 ribosomal RNA (rRNA) genes, as well as 28 and 30 transfer RNA (tRNA) genes in K. galanga and K. elegans, respectively. The gene order, GC content and orientation of the two Kaempferia chloroplast genomes exhibited high similarity. The location and distribution of simple sequence repeats (SSRs) and long repeat sequences were determined. Eight highly variable regions between the two Kaempferia species were identified and 643 mutation events, including 536 single-nucleotide polymorphisms (SNPs) and 107 insertion/deletions (indels), were accurately located. Sequence divergences of the whole chloroplast genomes were calculated among related Zingiberaceae species. The phylogenetic analysis based on SNPs among eleven species strongly supported that K. galanga and K. elegans formed a cluster within Zingiberaceae. This study identified the unique characteristics of the entire K. galanga and K. elegans chloroplast genomes that contribute to our understanding of the chloroplast DNA evolution within Zingiberaceae species. It provides valuable information for phylogenetic analysis and species identification within genus Kaempferia.
The first complete chloroplast genome of Hedychium coronarium (Zingiberaceae) was reported in this study. The H. coronarium chloroplast genome was 163,949bp in length and comprised a pair of inverted repeat (IR) regions of 29,780bp each, a large single-copy (LSC) region of 88,581bp and a small single-copy (SSC) region of 15,808bp. It encoded 141 genes, including 87 protein-coding genes (79 PCG species), 46 tRNA genes (28 tRNA species), and eight rRNA genes (four rRNA species). The nucleotide composition was asymmetric (31.68% A, 18.35% C, 17.74% G, 32.23% T) with an overall AT content of 63.92%. Phylogenetic analysis showed that H. coronarium was classified into a monophyletic group within the genus Hedychium in family Zingiberaceae.
The first complete chloroplast genome of Stahlianthus involucratus (Zingiberaceae) was reported in this study. The S. involucratus chloroplast genome was 163,300bp in length and consisted of one large sin- gle copy (LSC) region of 87,498bp, one small single copy (SSC) region of 15,568bp, and a pair of inverted repeat (IR) regions 30,117bp. It encoded 141 genes, including 87 protein-coding genes (79 PCG species), 46 tRNA genes (28 tRNA species) and 8 rRNA genes (4 rRNA species). The phylogenetic analysis based on single nucleotide polymorphisms strongly supported that S. involucratus, Curcuma roscoeana and Curcuma longa formed a cluster in group CurcumaII within family Zingiberaceae.
The first complete chloroplast genome of Amomum villosum (Zingiberaceae) was reported in this study. The A. villosum genome was 163,608bp in length, and comprised a pair of inverted repeat (IR) regions of 29,820bp each, a large single-copy (LSC) region of 88,680bp, and a small single-copy (SSC) region of 15,288bp. It encoded 141 genes, including 87 protein-coding genes (79 PCG species), 46 tRNA genes (28 tRNA species), and 8 rRNA genes (4 rRNA species). The overall AT content was 63.92%. Phylogenetic analysis showed that A. villosum was closely related to two species Amomum kravanh and Amomum compactum within the genus Amomum in family Zingiberaceae.
Carthamus tinctorius L, also known as safflower, is an important oil crop planted worldwide. The com- plete chloroplast (cp) genome was reported in this study using the PacBio Sequel Platform. The cp genome with a total size of 152,963bp consisted of two inverted repeats (25,128bp) separated by a large single-copy region (84,124bp) and a small single-copy region (18,583bp). Further annotation revealed the cp genome contains 112 genes, including 79 protein-coding genes, 29 tRNA genes, and 4 rRNA genes. The information of the cp genome will be useful for investigation of evolution and molecular breeding of safflower in the future.
The complete chloroplast genome sequence of watercress (Nasturtium officinale R. Br.): Genome organization, adaptive evolution and phylogenetic relationships in Cardamineae.
Watercress (Nasturtium officinale R. Br.), an aquatic leafy vegetable of the Brassicaceae family, is known as a nutritional powerhouse. Here, we de novo sequenced and assembled the complete chloroplast (cp) genome of watercress based on combined PacBio and Illumina data. The cp genome is 155,106?bp in length, exhibiting a typical quadripartite structure including a pair of inverted repeats (IRA and IRB) of 26,505?bp separated by a large single copy (LSC) region of 84,265?bp and a small single copy (SSC) region of 17,831?bp. The genome contained 113 unique genes, including 79 protein-coding genes, 30 tRNAs and 4 rRNAs, with 20 duplicate in the IRs. Compared with the prior cp genome of watercress deposited in GenBank, 21 single nucleotide polymorphisms (SNPs) and 27 indels were identified, mainly located in noncoding sequences. A total of 49 repeat structures and 71 simple sequence repeats (SSRs) were detected. Codon usage showed a bias for A/T-ending codons in the cp genome of watercress. Moreover, 45 RNA editing sites were predicted in 16 genes, all for C-to-U transitions. A comparative plastome study with Cardamineae species revealed a conserved gene order and high similarity of protein-coding sequences. Analysis of the Ka/Ks ratios of Cardamineae suggested positive selection exerted on the ycf2 gene in watercress, which might reflect specific adaptations of watercress to its particular living environment. Phylogenetic analyses based on complete cp genomes and common protein-coding genes from 56 species showed that the genus Nasturtium was a sister to Cardamine in the Cardamineae tribe. Our study provides valuable resources for future evolution, population genetics and molecular biology studies of watercress. Copyright © 2019 Elsevier B.V. All rights reserved.
The re-sequencing and re-assembly of complete chloroplast genome of Melastoma dodecandrum (Melastomataceae) from Fujian, China
The plant genus Melastoma of the family Melastomataceae is comprised of nine species and one var- iety in China. Melastoma dodecandrum is the only creeping species of this genus. Previous study has reported the complete chloroplast genome of M. dodecandrum from Guangzhou, China, but there may be some differences between plant populations from different regions. Herein, we reported the com- plete chloroplast genome of M. dodecandrum from Fuzhou, China, which was assembled from Pacbio and whole genome data was sequenced. The sequence has a circular molecular length of 156,598bp and contained 129 genes. Phylogenetic analysis indicated that M. dodecandrum was closely related to M. candidum in Melastomataceae. The study aims to provide insights for the future studies on the dif- ferences in molecular evolution level between plant populations of M. dodecandrum and taxonomy of Melastoma.
Complete mitochondrial genome of a Chinese oil tree yellowhorn, Xanthoceras sorbifolium (Sapindales, Sapindaceae)
Xanthoceras sorbifolium is an important woody oil seed tree in North China. In this study, the complete mitochondrial genome of X. sorbifolium was sequenced using Illumina Hiseq and PacBio sequencing technique. The mitogenome is 575,633bp in length and the GC content is 45.71%. The genome con- sists of 42 protein-coding genes, 4 ribosomal-RNA genes, and 24 transfer-RNA genes. Phylogenetic ana- lysis based on protein-coding genes showed that X. sorbifolium was close with the species in Bombacaceae and Malvaceae family.
Characterization and phylogenetic analysis of the complete chloroplast genome sequence of Costus viridis (Costaceae)
The first complete chloroplast genome of Costus viridis (Costaceae) was reported in the current study. The C. viridis genome was 168,966bp in length and comprised a pair of inverted repeat (IR) regions of 29,166bp each, a large single-copy (LSC) region of 92,189bp, and a small single-copy (SSC) region of 18,445bp. It encoded 133 genes, including 87 protein-coding genes (79 PCG species), 38 tRNA genes (28 tRNA species), and eight rRNA genes (four rRNA species). The overall AT content was 63.75%. Phylogenetic analysis showed that C. viridis was closely related to species Costus osae within the genus Costus in family Costaceae.
Mitochondrial and chloroplast genomes provide insights into the evolutionary origins of quinoa (Chenopodium quinoa Willd.).
Quinoa has recently gained international attention because of its nutritious seeds, prompting the expansion of its cultivation into new areas in which it was not originally selected as a crop. Improving quinoa production in these areas will benefit from the introduction of advantageous traits from free-living relatives that are native to these, or similar, environments. As part of an ongoing effort to characterize the primary and secondary germplasm pools for quinoa, we report the complete mitochondrial and chloroplast genome sequences of quinoa accession PI 614886 and the identification of sequence variants in additional accessions from quinoa and related species. This is the first reported mitochondrial genome assembly in the genus Chenopodium. Inference of phylogenetic relationships among Chenopodium species based on mitochondrial and chloroplast variants supports the hypotheses that 1) the A-genome ancestor was the cytoplasmic donor in the original tetraploidization event, and 2) highland and coastal quinoas were independently domesticated.
Comparative genomic and phylogenetic analyses of Populus section Leuce using complete chloroplast genome sequences
Species of Populus section Leuce are distributed throughout most parts of the Northern Hemisphere and have important economic and ecological significance. However, due to frequent hybridization within Leuce, the phylogenetic relationship between species has not been clarified. The chloroplast (cp) genome is characterized by maternal inheritance and relatively conservative mutation rates; thus, it is a powerful tool for building phylogenetic trees. In this study, we used the PacBio SEQUEL software to determine that the cp genome of Populus tomentosa has a length of 156,558 bp including a long single-copy region (84,717 bp), a small single-copy region (16,555 bp), and a pair of inverted repeat regions (27,643 bp). The cp genome contains 131 unique genes, including 37 transfer RNAs, 8 ribosomal RNAs, and 86 protein-coding genes. We compared the cp genomes of seven species of section Leuce and identified five cp DNA markers with >?1% variable sites. Phylogenetic analyses revealed two evolutionary branches for section Leuce. The species with the closest relationship with P. tomenstosa was P. adenopoda, followed by P. alba. These cp genome data will help to determine the cp evolution of section Leuce and further elucidate the origin of P. tomentosa.
Chrysanthemum boreale is a perennial plant in the Asteraceae family that is native to eastern Asia and has both ornamental and herbal uses. Here, we determined the complete chloroplast genome sequence for C. boreale using long-read sequencing. The chloroplast genome was 151,012?bp and consisted of a large single copy (LSC) region (82,817?bp), a small single copy (SSC) region (18,281?bp) and two inverted repeats (IRs) (24,957?bp). It was predicted to contain 131 genes, including 87 protein-coding genes, eight rRNAs and 46 tRNAs. Phylogenetic analysis of chloroplast genomes clustered C. boreale with other Chrysanthemum and Asteraceae species.
Actinidia arguta is the most basal species in a phylogenetically and economically important genus in the family Actinidiaceae. To better understand the molecular basis of the Actinidia arguta chloroplast (cp), we sequenced the complete cp genome from A. arguta using Illumina and PacBio RS II sequencing technologies. The cp genome from A. arguta was 157,611 bp in length and composed of a pair of 24,232 bp inverted repeats (IRs) separated by a 20,463 bp small single copy region (SSC) and an 88,684 bp large single copy region (LSC). Overall, the cp genome contained 113 unique genes. The cp genomes from A. arguta and three other Actinidia species from GenBank were subjected to a comparative analysis. Indel mutation events and high frequencies of base substitution were identified, and the accD and ycf2 genes showed a high degree of variation within Actinidia. Forty-seven simple sequence repeats (SSRs) and 155 repetitive structures were identified, further demonstrating the rapid evolution in Actinidia. The cp genome analysis and the identification of variable loci provide vital information for understanding the evolution and function of the chloroplast and for characterizing Actinidia population genetics.