Mitochondrial DNA (mtDNA) is a compact, double-stranded circular genome of 16,569 bp with a cytosine-rich light (L) chain and a guanine-rich heavy (H) chain. mtDNA mutations have been increasingly recognized as important contributors to an array of human diseases such as Parkinson’s disease, Alzheimer’s disease, colorectal cancer and Kearns–Sayre syndrome. mtDNA mutations can affect all of the 1000-10,000 copies of the mitochondrial genome present in a cell (homoplasmic mutation) or only a subset of copies (heteroplasmic mutation). The ratio of normal to mutant mtDNAs within cells is a significant factor in whether mutations will result in disease, as well as the clinical presentation, penetrance, and severity of the phenotype. Over time, heteroplasmic mutations can become homoplastic due to differential replication and random assortment. Full characterization of the mitochondrial genome would involve detection of not only homoplastic but heteroplasmic mutations, as well as complete phasing. Previously, we sequenced human mtDNA on the PacBio RS II System with two partially overlapping amplicons. Here, we present amplification-free, full-length sequencing of linearized mtDNA using the Sequel System. Full-length sequencing allows variant phasing along the entire mitochondrial genome, identification of heteroplasmic variants, and detection of epigenetic modifications that are lost in amplicon-based methods.
High-quality de novo genome assembly and intra-individual mitochondrial instability in the critically endangered kakapo
The kakapo (Strigops habroptila) is a large, flightless parrot endemic to New Zealand. It is highly endangered with only ~150 individuals remaining, and intensive conservation efforts are underway to save this iconic species from extinction. These include genetic studies to understand critical genes relevant to fertility, adaptation and disease resistance, and genetic diversity across the remaining population for future breeding program decisions. To aid with these efforts, we have generated a high-quality de novo genome assembly using PacBio long-read sequencing. Using the new diploid-aware FALCON-Unzip assembler, the resulting genome of 1.06 Gb has a contig N50 of 5.6 Mb (largest contig 29.3 Mb), >350-times more contiguous compared to a recent short-read assembly of a closely related parrot (kea) species. We highlight the benefits of the higher contiguity and greater completeness of the kakapo genome assembly through examples of fully resolved genes important in wildlife conservation (contrasted with fragmented and incomplete gene resolution in short-read assemblies), in some cases even providing sequence for regions orthologous to gaps of missing sequence in the chicken reference genome. We also highlight the complete resolution of the kakapo mitochondrial genome, fully containing the mitochondrial control region which is missing from the previous dedicated kakapomitochondrial genome NCBI entry. For this region, we observed a marked heterogeneity in the number of tandem repeats in different mtDNAmolecules from a single bird tissue, highlighting the enhanced molecular resolution uniquely afforded by long-read, single-molecule PacBio sequencing.
Targeted sequencing with Sanger as well as short read based high throughput sequencing methods is standard practice in clinical genetic testing. However, many applications beyond SNP detection have remained somewhat obstructed due to technological challenges. With the advent of long reads and high consensus accuracy, SMRT Sequencing overcomes many of the technical hurdles faced by Sanger and NGS approaches, opening a broad range of untapped clinical sequencing opportunities. Flexible multiplexing options, highly adaptable sample preparation method and newly improved two well-developed analysis methods that generate highly-accurate sequencing results, make SMRT Sequencing an adept method for clinical grade targeted sequencing. The Circular Consensus Sequencing (CCS) analysis pipeline produces QV 30 data from each single intra-molecular multi-pass polymerase read, making it a reliable solution for detecting minor variant alleles with frequencies as low as 1 %. Long Amplicon Analysis (LAA) makes use of insert spanning full-length subreads originating from multiple individual copies of the target to generate highly accurate and phased consensus sequences (>QV50), offering a unique advantage for imputation free allele segregation and haplotype phasing. Here we present workflows and results for a range of SMRT Sequencing clinical applications. Specifically, we illustrate how the flexible multiplexing options, simple sample preparation methods and new developments in data analysis tools offered by PacBio in support of Sequel System 5.1 can come together in a variety of experimental designs to enable applications as diverse as high throughput HLA typing, mitochondrial DNA sequencing and viral vector integrity profiling of recombinant adeno-associated viral genomes (rAAV).
Chlorella vulgaris genome assembly and annotation reveals the molecular basis for metabolic acclimation to high light conditions.
Chlorella vulgaris is a fast-growing fresh-water microalga cultivated at the industrial scale for applications ranging from food to biofuel production. To advance our understanding of its biology and to establish genetics tools for biotechnological manipulation, we sequenced the nuclear and organelle genomes of Chlorella vulgaris 211/11P by combining next generation sequencing and optical mapping of isolated DNA molecules. This hybrid approach allowed to assemble the nuclear genome in 14 pseudo-molecules with an N50 of 2.8 Mb and 98.9% of scaffolded genome. The integration of RNA-seq data obtained at two different irradiances of growth (high light-HL versus low light -LL) enabled to identify 10,724 nuclear genes, coding for 11,082 transcripts. Moreover 121 and 48 genes were respectively found in the chloroplast and mitochondrial genome. Functional annotation and expression analysis of nuclear, chloroplast and mitochondrial genome sequences revealed peculiar features of Chlorella vulgaris. Evidence of horizontal gene transfers from chloroplast to mitochondrial genome was observed. Furthermore, comparative transcriptomic analyses of LL vs HL provide insights into the molecular basis for metabolic rearrangement in HL vs. LL conditions leading to enhanced de novo fatty acid biosynthesis and triacylglycerol accumulation. The occurrence of a cytosolic fatty acid biosynthetic pathway can be predicted and its upregulation upon HL exposure is observed, consistent with increased lipid amount under HL. These data provide a rich genetic resource for future genome editing studies, and potential targets for biotechnological manipulation of Chlorella vulgaris or other microalgae species to improve biomass and lipid productivity.This article is protected by copyright. All rights reserved.
Evidence of extensive intraspecific noncoding reshuffling in a 169-kb mitochondrial genome of a basidiomycetous fungus
Comparative genomics of fungal mitochondrial genomes (mitogenomes) have revealed a remarkable pattern of rearrangement between and within major phyla owing to horizontal gene transfer (HGT) and recombination. The role of recombination was exemplified at a finer evolutionary time scale in basidiomycetes group of fungi as they display a diversity of mitochondrial DNA (mtDNA) inheritance patterns. Here, we assembled mitogenomes of six species from the Hymenochaetales order of basidiomycetes and examined 59 mitogenomes from two genetic lineages of Pyrrhoderma noxium. Gene order is largely colinear while intergene regions are major determinants of mitogenome size variation. Substantial sequence divergence was found in shared introns consistent with high HGT frequency observed in yeasts, but we also identified a rare case where an intron was retained in five species since speciation. In contrast to the hyperdiversity observed in nuclear genomes of P. noxium, mitogenomes’ intraspecific polymorphisms at protein coding sequences are extremely low. Phylogeny based on introns revealed turnover as well as exchange of introns between two lineages. Strikingly, some strains harbor a mosaic origin of introns from both lineages. Analysis of intergenic sequence indicated substantial differences between and within lineages, and an expansion may be ongoing as a result of exchange between distal intergenes. These findings suggest that the evolution in mtDNAs is usually lineage specific but chimeric mitotypes are frequently observed, thus capturing the possible evolutionary processes shaping mitogenomes in a basidiomycete. The large mitogenome sizes reported in various basidiomycetes appear to be a result of interspecific reshuffling of intergenes.
Decoding and analysis of organelle genomes of Indian tea (Camellia assamica) for phylogenetic confirmation.
The NCBI database has >15 chloroplast (cp) genome sequences available for different Camellia species but none for C. assamica. There is no report of any mitochondrial (mt) genome in the Camellia genus or Theaceae family. With the strong believes that these organelle genomes can play a great tool for taxonomic and phylogenetic analysis, we successfully assembled and analyzed cp and mt genome of C. assamica. We assembled the complete mt genome of C. assamica in a single circular contig of 707,441?bp length comprising of a total of 66 annotated genes, including 35 protein-coding genes, 29 tRNAs and two rRNAs. The first ever cp genome of C. assamica resulted in a circular contig of 157,353?bp length with a typical quadripartite structure. Phylogenetic analysis based on these organelle genomes showed that C. assamica was closely related to C. sinensis and C. leptophylla. It also supports Caryophyllales as Superasterids. Copyright © 2019. Published by Elsevier Inc.
Hepatotoxicity is the most severe adverse effect of anti-tuberculosis therapy. Isoniazid’s metabolite hydrazine is a mitochondrial complex II inhibitor. We hypothesized that mitochondrial DNA variants are risk factors for drug-induced liver injury (DILI) due to isoniazid, rifampicin or pyrazinamide.We obtained peripheral blood from tuberculosis (TB) patients before anti-TB therapy. A total of 38 patients developed DILI due to anti-TB drugs. We selected 38 patients with TB but without DILI as controls. Next-generation sequencing detected point mutations in the mitochondrial DNA genome. DILI was defined as ALT =5 times the upper limit of normal (ULN), or ALT =3 times the ULN with total bilirubin =2 times the ULN.In 38 patients with DILI, the causative drug was isoniazid in eight, rifampicin in 14 and pyrazinamide in 16. Patients with isoniazid-induced liver injury had more variants in complex I’s NADH subunit 5 and 1 genes, more nonsynonymous mutations in NADH subunit 5, and a higher ratio of nonsynonymous to total substitutions. Patients with rifampicin- or pyrazinamide-induced liver injury had no association with mitochondrial DNA variants.Variants in complex I’s subunit 1 and 5 genes might affect respiratory chain function and predispose isoniazid-induced liver injury when exposed to hydrazine, a metabolite of isoniazid and a complex II inhibitor.
Pacbio Sequencing Reveals Identical Organelle Genomes between American Cranberry (Vaccinium macrocarpon Ait.) and a Wild Relative.
Breeding efforts in the American cranberry (Vaccinium macrocarpon Ait.), a North American perennial fruit crop of great importance, have been hampered by the limited genetic and phenotypic variability observed among cultivars and experimental materials. Most of the cultivars commercially used by cranberry growers today were derived from a few wild accessions bred in the 1950s. In different crops, wild germplasm has been used as an important genetic resource to incorporate novel traits and increase the phenotypic diversity of breeding materials. Vaccinium microcarpum (Turcz. ex Rupr.) Schmalh. and V. oxycoccos L., two closely related species, may be cross-compatible with the American cranberry, and could be useful to improve fruit quality such as phytochemical content. Furthermore, given their northern distribution, they could also help develop cold hardy cultivars. Although these species have previously been analyzed in diversity studies, genomic characterization and comparative studies are still lacking. In this study, we sequenced and assembled the organelle genomes of the cultivated American cranberry and its wild relative, V. microcarpum. PacBio sequencing technology allowed us to assemble both mitochondrial and plastid genomes at very high coverage and in a single circular scaffold. A comparative analysis revealed that the mitochondrial genome sequences were identical between both species and that the plastids presented only two synonymous single nucleotide polymorphisms (SNPs). Moreover, the Illumina resequencing of additional accessions of V. microcarpum and V. oxycoccos revealed high genetic variation in both species. Based on these results, we provided a hypothesis involving the extension and dynamics of the last glaciation period in North America, and how this could have shaped the distribution and dispersal of V. microcarpum. Finally, we provided important data regarding the polyploid origin of V. oxycoccos.
Candidate Gene Selection for Cytoplasmic Male Sterility in Pepper (Capsicum annuum L.) through Whole Mitochondrial Genome Sequencing.
Cytoplasmic male sterility (CMS), which is controlled by mitochondrial genes, is an important trait for commercial hybrid seed production. So far, genes controlling this trait are still not clear in pepper. In this study, complete mitochondrial genomes were sequenced and assembled for the CMS line 138A and its maintainer line 138B. The genome size of 138A is 504,210 bp, which is 8618 bp shorter than that of 138B. Meanwhile, more than 214 and 215 open reading frames longer than 100 amino acids (aas) were identified in 138A and 138B, respectively. Mitochondrial genome structure of 138A was quite different from that of 138B, indicating the existence of recombination and rearrangement events. Based on the mitochondrial genome sequence and structure variations, mitochondrion of 138A and FS4401, a Korean origin CMS line, may have inherited from a common female ancestor, but their CMS traits did originate separately. Candidate gene selection was performed according to the published characteristics of the CMS genes, including the presence SNPs and InDels, located in unique regions, their chimeric structure, co-transcription, and transmembrane domain. A total of 35 ORFs were considered as potential candidate genes and 14 of these were selected, with orf300a and 0rf314a as strong candidates. A new marker, orf300a, was developed which did co-segregate with the CMS trait.
Foodborne infections caused by lung flukes of the genus Paragonimus are a significant and widespread public health problem in tropical areas. Approximately 50 Paragonimus species have been reported to infect animals and humans, but Paragonimus westermani is responsible for the bulk of human disease. Despite their medical and economic importance, no genome sequence for any Paragonimus species is available.We sequenced and assembled the genome of P. westermani, which is among the largest of the known pathogen genomes with an estimated size of 1.1 Gb. A 922.8 Mb genome assembly was generated from Illumina and Pacific Biosciences (PacBio) sequence data, covering 84% of the estimated genome size. The genome has a high proportion (45%) of repeat-derived DNA, particularly of the long interspersed element and long terminal repeat subtypes, and the expansion of these elements may explain some of the large size. We predicted 12,852 protein coding genes, showing a high level of conservation with related trematode species. The majority of proteins (80%) had homologs in the human liver fluke Opisthorchis viverrini, with an average sequence identity of 64.1%. Assembly of the P. westermani mitochondrial genome from long PacBio reads resulted in a single high-quality circularized 20.6 kb contig. The contig harbored a 6.9 kb region of non-coding repetitive DNA comprised of three distinct repeat units. Our results suggest that the region is highly polymorphic in P. westermani, possibly even within single worm isolates.The generated assembly represents the first Paragonimus genome sequence and will facilitate future molecular studies of this important, but neglected, parasite group.
Plant mitochondrial genomes are usually assembled and displayed as circular maps based on the widely-held view across the broad community of life scientists that circular genome-sized molecules are the primary form of plant mitochondrial DNA, despite the understanding by plant mitochondrial researchers that this is an inaccurate and outdated concept. Many plant mitochondrial genomes have one or more pairs of large repeats that can act as sites for inter- or intramolecular recombination, leading to multiple alternative arrangements (isoforms). Most mitochondrial genomes have been assembled using methods unable to capture the complete spectrum of isoforms within a species, leading to an incomplete inference of their structure and recombinational activity. To document and investigate underlying reasons for structural diversity in plant mitochondrial DNA, we used long-read (PacBio) and short-read (Illumina) sequencing data to assemble and compare mitochondrial genomes of domesticated (Lactuca sativa) and wild (L. saligna and L. serriola) lettuce species. We characterized a comprehensive, complex set of isoforms within each species and compared genome structures between species. Physical analysis of L. sativa mtDNA molecules by fluorescence microscopy revealed a variety of linear, branched, and circular structures. The mitochondrial genomes for L. sativa and L. serriola were identical in sequence and arrangement and differed substantially from L. saligna, indicating that the mitochondrial genome structure did not change during domestication. From the isoforms in our data, we infer that recombination occurs at repeats of all sizes at variable frequencies. The differences in genome structure between L. saligna and the two other Lactuca species can be largely explained by rare recombination events that rearranged the structure. Our data demonstrate that representations of plant mitochondrial genomes as simple, circular molecules are not accurate descriptions of their true nature and that in reality plant mitochondrial DNA is a complex, dynamic mixture of forms.
Mitochondrial DNA and their nuclear copies in the parasitic wasp Pteromalus puparum: A comparative analysis in Chalcidoidea.
Chalcidoidea (chalcidoid wasps) are an abundant and megadiverse insect group with both ecological and economical importance. Here we report a complete mitochondrial genome in Chalcidoidea from Pteromalus puparum (Pteromalidae). Eight tandem repeats followed by 6 reversed repeats were detected in its 3308?bp control region. This long and complex control region may explain failures of amplifying and sequencing of complete mitochondrial genomes in some chalcidoids. In addition to 37 typical mitochondrial genes, an extra identical isoleucine tRNA (trnI) was detected at the opposite end of the control region. This recent mitochondrial gene duplication indicates that gene arrangements in chalcidoids are ongoing. A comparison among available chalcidoid mitochondrial genomes reveals rapid gene order rearrangements overall and high protein substitution rates in most chalcidoid taxa. In addition, we identified 24 nuclear sequences of mitochondrial origin (NUMTs) in P. puparum, summing up to 9989?bp, with 3617?bp of these NUMTs originating from mitochondrial coding regions. NUMTs abundance in P. puparum is only one-twelfth of that in its relative, Nasonia vitripennis. Based on phylogenetic analysis, we provide evidence that a faster nuclear degradation rate contributes to the reduced NUMT numbers in P. puparum. Overall, our study shows unusually high rates of mitochondrial evolution and considerable variation in NUMT accumulation in Chalcidoidea. Copyright © 2018. Published by Elsevier B.V.
Mitochondrial genome characterization of Melipona bicolor: Insights from the control region and gene expression data.
The stingless bee Melipona bicolor is the only bee in which true polygyny occurs. Its mitochondrial genome was first sequenced in 2008, but it was incomplete and no information about its transcription was known. We combined short and long reads of M. bicolor DNA with RNASeq data to obtain insights about mitochondrial evolution and gene expression in bees. The complete genome has 15,001?bp, including a control region of 255?bp that contains all conserved structures described in honeybees with the highest AT content reported so far for bees (98.1%), displaying a compact but functional region. Gene expression control is similar to other insects however unusual patterns of expression may suggest the existence of different isoforms for the mitochondrially encoded 12S rRNA. Results reveal unique and shared features of the mitochondrial genome in terms of sequence evolution and gene expression making M. bicolor an interesting model to study mitochondrial genomic evolution. Copyright © 2019 Elsevier B.V. All rights reserved.
Hemiptelea davidii (Hance) Planch is a potential valuable forest tree in arid sandy environments. Here, the complete mitochondrial genome of H. davidii was assembled using a combination of the PacBio Sequel data and the Illumina Hiseq data. The mitochondrial genome is 460,941bp in length, including 37 protein-coding genes, 19 tRNA genes, and three rRNA genes. The GC content of the whole mito- chondrial genome is 44.84%. Phylogenetic analyses indicated that H. davidii is close with Cannabis and Morus species.
The complete mitochondrial genome sequence of Schisandra chinensis (Austrobaileyales: Schisandraceae)
Chinese magnolia vine (Schisandra chinensis) is an economically important oriental medicinal plant that belongs to the Schisandraceae family. The complete mitochondrial genome sequence of S. chinensis was 946,141bp in length. A total of 45 genes was annotated, including 30 protein-coding genes, 12 tRNA genes, and 3 rRNA genes. A phylogenetic tree based on the mitochondrial genome demonstrated that S. chinensis was most closely related to Schisandra sphenanthera of the Schisandraceae family.