Animal mitogenomes are generally thought of as being economic and optimized for rapid replication and transcription. We use long-read sequencing technology to assemble the remarkable mitogenomes of four species of seed beetles. These are the largest circular mitogenomes ever assembled in insects, ranging from 24,496 to 26,613?bp in total length, and are exceptional in that some 40% consists of non-coding DNA. The size expansion is due to two very long intergenic spacers (LIGSs), rich in tandem repeats. The two LIGSs are present in all species but vary greatly in length (114-10,408?bp), show very low sequence similarity, divergent tandem repeat motifs, a very high AT content and concerted length evolution. The LIGSs have been retained for at least some 45 my but must have undergone repeated reductions and expansions, despite strong purifying selection on protein coding mtDNA genes. The LIGSs are located in two intergenic sites where a few recent studies of insects have also reported shorter LIGSs (>200?bp). These sites may represent spaces that tolerate neutral repeat array expansions or, alternatively, the LIGSs may function to allow a more economic translational machinery. Mitochondrial respiration in adult seed beetles is based almost exclusively on fatty acids, which reduces the need for building complex I of the oxidative phosphorylation pathway (NADH dehydrogenase). One possibility is thus that the LIGSs may allow depressed transcription of NAD genes. RNA sequencing showed that LIGSs are partly transcribed and transcriptional profiling suggested that all seven mtDNA NAD genes indeed show low levels of transcription and co-regulation of transcription across sexes and tissues.© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Ammopiptanthus nanus is a rare broad-leaved shrub that is found in the desert and arid regions of Central Asia. This plant species exhibits extremely high tolerance to drought and freezing and has been used in abiotic tolerance research in plants. As a relic of the tertiary period, A. nanus is of great significance to plant biogeographic research in the ancient Mediterranean region. Here, we report a draft genome assembly using the Pacific Biosciences (PacBio) platform and gene annotation for A. nanus.A total of 64.72 Gb of raw PacBio sequel reads were generated from four 20-kb libraries. After filtering, 64.53 Gb of clean reads were obtained, giving 72.59× coverage depth. Assembly using Canu gave an assembly length of 823.74 Mb, with a contig N50 of 2.76 Mb. The final size of the assembled A. nanus genome was close to the 889 Mb estimated by k-mer analysis. The gene annotation completeness was evaluated using Benchmarking Universal Single-Copy Orthologs; 1,327 of the 1,440 conserved genes (92.15%) could be found in the A. nanus assembly. Genome annotation revealed that 74.08% of the A. nanus genome is composed of repetitive elements and 53.44% is composed of long terminal repeat elements. We predicted ?37,188 protein-coding genes, of which 96.53% were functionally annotated.The genomic sequences of A. nanus could be a valuable source for comparative genomic analysis in the legume family and will be useful for understanding the phylogenetic relationships of the Thermopsideae and the evolutionary response of plant species to the Qinghai Tibetan Plateau uplift.
Microsatellite expansion, such as trinucleotide repeat expansion (TRE), is known to cause a number of genetic diseases. Sanger sequencing and next-generation short-read sequencing are unable to interrogate TRE reliably. We developed a novel algorithm called RepeatHMM to estimate repeat counts from long-read sequencing data. Evaluation on simulation data, real amplicon sequencing data on two repeat expansion disorders, and whole-genome sequencing data generated by PacBio and Oxford Nanopore technologies showed superior performance over competing approaches. We concluded that long-read sequencing coupled with RepeatHMM can estimate repeat counts on microsatellites and can interrogate the “unsequenceable” genomic trinucleotide repeat disorders.