The emergence of third generation sequencing (3GS; long-reads) is making closer the goal of chromosome-size fragments in de novo genome assemblies. This allows the exploration of new and broader questions on genome evolution for a number of non-model organisms. However, long-read technologies result in higher sequencing error rates and therefore impose an elevated cost of sufficient coverage to achieve high enough quality. In this context, hybrid assemblies, combining short-reads and long-reads provide an alternative efficient and cost-effective approach to generate de novo, chromosome-level genome assemblies. The array of available software programs for hybrid genome assembly, sequence correction and manipulation is constantly being expanded and improved. This makes it difficult for non-experts to find efficient, fast and tractable computational solutions for genome assembly, especially in the case of non-model organisms lacking a reference genome or one from a closely related species. In this study, we review and test the most recent pipelines for hybrid assemblies, comparing the model organism Drosophila melanogaster to a non-model cactophilic Drosophila, D. mojavensis. We show that it is possible to achieve excellent contiguity on this non-model organism using the DBG2OLC pipeline.
A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system
Background A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. Results The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ~20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ~36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. Conclusions We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.
The Chinese chestnut (Castanea mollissima) is widely cultivated in China for nut production. This plant also plays an important ecological role in afforestation and ecosystem services. To facilitate and expand the use of C. mollissima for breeding and its genetic improvement, we report here the whole-genome sequence of C. mollissima.We produced a high-quality assembly of the C. mollissima genome using Pacific Biosciences single-molecule sequencing. The final draft genome is ~785.53 Mb long, with a contig N50 size of 944 kb, and we further annotated 36,479 protein-coding genes in the genome. Phylogenetic analysis showed that C. mollissima diverged from Quercus robur, a member of the Fagaceae family, ~13.62 million years ago.The high-quality whole-genome assembly of C. mollissima will be a valuable resource for further genetic improvement and breeding for disease resistance and nut quality. © The Author(s) 2019. Published by Oxford University Press.
The Reference Genome Sequence of Scutellaria baicalensis Provides Insights into the Evolution of Wogonin Biosynthesis.
Scutellaria baicalensis Georgi is important in Chinese traditional medicine where preparations of dried roots, “Huang Qin,” are used for liver and lung complaints and as complementary cancer treatments. We report a high-quality reference genome sequence for S. baicalensis where 93% of the 408.14-Mb genome has been assembled into nine pseudochromosomes with a super-N50 of 33.2 Mb. Comparison of this sequence with those of closely related species in the order Lamiales, Sesamum indicum and Salvia splendens, revealed that a specialized metabolic pathway for the synthesis of 4′-deoxyflavone bioactives evolved in the genus Scutellaria. We found that the gene encoding a specific cinnamate coenzyme A ligase likely obtained its new function following recent mutations, and that four genes encoding enzymes in the 4′-deoxyflavone pathway are present as tandem repeats in the genome of S. baicalensis. Further analyses revealed that gene duplications, segmental duplication, gene amplification, and point mutations coupled to gene neo- and subfunctionalizations were involved in the evolution of 4′-deoxyflavone synthesis in the genus Scutellaria. Our study not only provides significant insight into the evolution of specific flavone biosynthetic pathways in the mint family, Lamiaceae, but also will facilitate the development of tools for enhancing bioactive productivity by metabolic engineering in microbes or by molecular breeding in plants. The reference genome of S. baicalensis is also useful for improving the genome assemblies for other members of the mint family and offers an important foundation for decoding the synthetic pathways of bioactive compounds in medicinal plants.Copyright © 2019 The Authors. Published by Elsevier Inc. All rights reserved.
The Genome of Armadillidium vulgare (Crustacea, Isopoda) Provides Insights into Sex Chromosome Evolution in the Context of Cytoplasmic Sex Determination.
The terrestrial isopod Armadillidium vulgare is an original model to study the evolution of sex determination and symbiosis in animals. Its sex can be determined by ZW sex chromosomes, or by feminizing Wolbachia bacterial endosymbionts. Here, we report the sequence and analysis of the ZW female genome of A. vulgare. A distinguishing feature of the 1.72 gigabase assembly is the abundance of repeats (68% of the genome). We show that the Z and W sex chromosomes are essentially undifferentiated at the molecular level and the W-specific region is extremely small (at most several hundreds of kilobases). Our results suggest that recombination suppression has not spread very far from the sex-determining locus, if at all. This is consistent with A. vulgare possessing evolutionarily young sex chromosomes. We characterized multiple Wolbachia nuclear inserts in the A. vulgare genome, none of which is associated with the W-specific region. We also identified several candidate genes that may be involved in the sex determination or sexual differentiation pathways. The A. vulgare genome serves as a resource for studying the biology and evolution of crustaceans, one of the most speciose and emblematic metazoan groups. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
The commercial release of third-generation sequencing technologies (TGSTs), giving long and ultra-long sequencing reads, has stimulated the development of new tools for assembling highly contiguous genome sequences with unprecedented accuracy across complex repeat regions. We survey here a wide range of emerging sequencing platforms and analytical tools for de novo assembly, provide background information for each of their steps, and discuss the spectrum of available options. Our decision tree recommends workflows for the generation of a high-quality genome assembly when used in combination with the specific needs and resources of a project.Copyright © 2019 Elsevier Ltd. All rights reserved.
Whole genome sequence and de novo assembly revealed genomic architecture of Indian Mithun (Bos frontalis).
Mithun (Bos frontalis), also called gayal, is an endangered bovine species, under the tribe bovini with 2n?=?58 XX chromosome complements and reared under the tropical rain forests region of India, China, Myanmar, Bhutan and Bangladesh. However, the origin of this species is still disputed and information on its genomic architecture is scanty so far. We trust that availability of its whole genome sequence data and assembly will greatly solve this problem and help to generate many information including phylogenetic status of mithun. Recently, the first genome assembly of gayal, mithun of Chinese origin, was published. However, an improved reference genome assembly would still benefit in understanding genetic variation in mithun populations reared under diverse geographical locations and for building a superior consensus assembly. We, therefore, performed deep sequencing of the genome of an adult female mithun from India, assembled and annotated its genome and performed extensive bioinformatic analyses to produce a superior de novo genome assembly of mithun.We generated ˜300 Gigabyte (Gb) raw reads from whole-genome deep sequencing platforms and assembled the sequence data using a hybrid assembly strategy to create a high quality de novo assembly of mithun with 96% recovered as per BUSCO analysis. The final genome assembly has a total length of 3.0 Gb, contains 5,015 scaffolds with an N50 value of 1?Mb. Repeat sequences constitute around 43.66% of the assembly. The genomic alignments between mithun to cattle showed that their genomes, as expected, are highly conserved. Gene annotation identified 28,044 protein-coding genes presented in mithun genome. The gene orthologous groups of mithun showed a high degree of similarity in comparison with other species, while fewer mithun specific coding sequences were found compared to those in cattle.Here we presented the first de novo draft genome assembly of Indian mithun having better coverage, less fragmented, better annotated, and constitutes a reasonably complete assembly compared to the previously published gayal genome. This comprehensive assembly unravelled the genomic architecture of mithun to a great extent and will provide a reference genome assembly to research community to elucidate the evolutionary history of mithun across its distinct geographical locations.
Multispecies host-parasite evolution is common, but how parasites evolve after speciating remains poorly understood. Shared evolutionary history and physiology may propel species along similar evolutionary trajectories whereas pursuing different strategies can reduce competition. We test these scenarios in the economically important association between honey bees and ectoparasitic mites by sequencing the genomes of the sister mite species Varroa destructor and Varroa jacobsoni. These genomes were closely related, with 99.7% sequence identity. Among the 9,628 orthologous genes, 4.8% showed signs of positive selection in at least one species. Divergent selective trajectories were discovered in conserved chemosensory gene families (IGR, SNMP), and Halloween genes (CYP) involved in moulting and reproduction. However, there was little overlap in these gene sets and associated GO terms, indicating different selective regimes operating on each of the parasites. Based on our findings, we suggest that species-specific strategies may be needed to combat evolving parasite communities. © The Author(s) 2019.
The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map.Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel_HAv3) is significantly more contiguous and complete than the previous one (Amel_4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor >?98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features.The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics.