Menu
April 21, 2020

De novo genome assembly of the endangered Acer yangbiense, a plant species with extremely small populations endemic to Yunnan Province, China.

Acer yangbiense is a newly described critically endangered endemic maple tree confined to Yangbi County in Yunnan Province in Southwest China. It was included in a programme for rescuing the most threatened species in China, focusing on “plant species with extremely small populations (PSESP)”.We generated 64, 94, and 110 Gb of raw DNA sequences and obtained a chromosome-level genome assembly of A. yangbiense through a combination of Pacific Biosciences Single-molecule Real-time, Illumina HiSeq X, and Hi-C mapping, respectively. The final genome assembly is ~666 Mb, with 13 chromosomes covering ~97% of the genome and scaffold N50 sizes of 45 Mb. Further, BUSCO analysis recovered 95.5% complete BUSCO genes. The total number of repetitive elements account for 68.0% of the A. yangbiense genome. Genome annotation generated 28,320 protein-coding genes, assisted by a combination of prediction and transcriptome sequencing. In addition, a nearly 1:1 orthology ratio of dot plots of longer syntenic blocks revealed a similar evolutionary history between A. yangbiense and grape, indicating that the genome has not undergone a whole-genome duplication event after the core eudicot common hexaploidization.Here, we report a high-quality de novo genome assembly of A. yangbiense, the first genome for the genus Acer and the family Aceraceae. This will provide fundamental conservation genomics resources, as well as representing a new high-quality reference genome for the economically important Acer lineage and the wider order of Sapindales. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020

Complete Genome Sequence of Halocella sp. Strain SP3-1, an Extremely Halophilic, Glycoside Hydrolase- and Bacteriocin-Producing Bacterium Isolated from a Salt Evaporation Pond.

Halocella sp. strain SP3-1, a cellulose-degrading bacterium, was isolated from a hypersaline evaporation pond in Thailand. Here, we report the first complete genome sequence of strain SP3-1. This species has a genome size of 4,035,760 bases, and the genome contains several genes encoding cellulose, hemicellulose, starch-degrading enzymes, and bacteriocins.


April 21, 2020

Survey of the Bradysia odoriphaga Transcriptome Using PacBio Single-Molecule Long-Read Sequencing.

The damage caused by Bradysia odoriphaga is the main factor threatening the production of vegetables in the Liliaceae family. However, few genetic studies of B. odoriphaga have been conducted because of a lack of genomic resources. Many long-read sequencing technologies have been developed in the last decade; therefore, in this study, the transcriptome including all development stages of B. odoriphaga was sequenced for the first time by Pacific single-molecule long-read sequencing. Here, 39,129 isoforms were generated, and 35,645 were found to have annotation results when checked against sequences available in different databases. Overall, 18,473 isoforms were distributed in 25 various Clusters of Orthologous Groups, and 11,880 isoforms were categorized into 60 functional groups that belonged to the three main Gene Ontology classifications. Moreover, 30,610 isoforms were assigned into 44 functional categories belonging to six main Kyoto Encyclopedia of Genes and Genomes functional categories. Coding DNA sequence (CDS) prediction showed that 36,419 out of 39,129 isoforms were predicted to have CDS, and 4319 simple sequence repeats were detected in total. Finally, 266 insecticide resistance and metabolism-related isoforms were identified as candidate genes for further investigation of insecticide resistance and metabolism in B. odoriphaga.


April 21, 2020

A chromosome-scale genome assembly of cucumber (Cucumis sativus L.).

Accurate and complete reference genome assemblies are fundamental for biological research. Cucumber is an important vegetable crop and model system for sex determination and vascular biology. Low-coverage Sanger sequences and high-coverage short Illumina sequences have been used to assemble draft cucumber genomes, but the incompleteness and low quality of these genomes limit their use in comparative genomics and genetic research. A high-quality and complete cucumber genome assembly is therefore essential.We assembled single-molecule real-time (SMRT) long reads to generate an improved cucumber reference genome. This version contains 174 contigs with a total length of 226.2 Mb and an N50 of 8.9 Mb, and provides 29.0 Mb more sequence data than previous versions. Using 10X Genomics and high-throughput chromosome conformation capture (Hi-C) data, 89 contigs (~211.0 Mb) were directly linked into 7 pseudo-chromosome sequences. The newly assembled regions show much higher guanine-cytosine or adenine-thymine content than found previously, which is likely to have been inaccessible to Illumina sequencing. The new assembly contains 1,374 full-length long terminal retrotransposons and 1,078 novel genes including 239 tandemly duplicated genes. For example, we found 4 tandemly duplicated tyrosylprotein sulfotransferases, in contrast to the single copy of the gene found previously and in most other plants.This high-quality genome presents novel features of the cucumber genome and will serve as a valuable resource for genetic research in cucumber and plant comparative genomics. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020

The genome assembly and annotation of yellowhorn (Xanthoceras sorbifolium Bunge).

Yellowhorn (Xanthoceras sorbifolium Bunge), a deciduous shrub or small tree native to north China, is of great economic value. Seeds of yellowhorn are rich in oil containing unsaturated long-chain fatty acids that have been used for producing edible oil and nervonic acid capsules. However, the lack of a high-quality genome sequence hampers the understanding of its evolution and gene functions.In this study, a whole genome of yellowhorn was sequenced and assembled by integration of Illumina sequencing, Pacific Biosciences single-molecule real-time sequencing, 10X Genomics linked reads, Bionano optical maps, and Hi-C. The yellowhorn genome assembly was 439.97 Mb, which comprised 15 pseudo-chromosomes covering 95.42% (419.84 Mb) of the assembled genome. The repetitive fractions accounted for 56.39% of the yellowhorn genome. The genome contained 21,059 protein-coding genes. Of them, 18,503 (87.86%) genes were found to be functionally annotated with =1 “annotation” term by searching against other databases. Transcriptomic analysis showed that 341, 135, 125, 113, and 100 genes were specifically expressed in hermaphrodite flower, staminate flower, young fruit, leaf, and shoot, respectively. Phylogenetic analysis suggested that yellowhorn and Dimocarpus longan diverged from their most recent common ancestor ~46 million years ago.The availability and subsequent annotation of the yellowhorn genome, as well as the identification of tissue-specific functional genes, provides a valuable reference for plant comparative genomics, evolutionary studies, and molecular design breeding. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020

Pseudomolecule-level assembly of the Chinese oil tree yellowhorn (Xanthoceras sorbifolium) genome.

Yellowhorn (Xanthoceras sorbifolium) is a species of the Sapindaceae family native to China and is an oil tree that can withstand cold and drought conditions. A pseudomolecule-level genome assembly for this species will not only contribute to understanding the evolution of its genes and chromosomes but also bring yellowhorn breeding into the genomic era.Here, we generated 15 pseudomolecules of yellowhorn chromosomes, on which 97.04% of scaffolds were anchored, using the combined Illumina HiSeq, Pacific Biosciences Sequel, and Hi-C technologies. The length of the final yellowhorn genome assembly was 504.2 Mb with a contig N50 size of 1.04 Mb and a scaffold N50 size of 32.17 Mb. Genome annotation revealed that 68.67% of the yellowhorn genome was composed of repetitive elements. Gene modelling predicted 24,672 protein-coding genes. By comparing orthologous genes, the divergence time of yellowhorn and its close sister species longan (Dimocarpus longan) was estimated at ~33.07 million years ago. Gene cluster and chromosome synteny analysis demonstrated that the yellowhorn genome shared a conserved genome structure with its ancestor in some chromosomes.This genome assembly represents a high-quality reference genome for yellowhorn. Integrated genome annotations provide a valuable dataset for genetic and molecular research in this species. We did not detect whole-genome duplication in the genome. The yellowhorn genome carries syntenic blocks from ancient chromosomes. These data sources will enable this genome to serve as an initial platform for breeding better yellowhorn cultivars. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020

The genomes of pecan and Chinese hickory provide insights into Carya evolution and nut nutrition.

Pecan (Carya illinoinensis) and Chinese hickory (C. cathayensis) are important commercially cultivated nut trees in the genus Carya (Juglandaceae), with high nutritional value and substantial health benefits.We obtained >187.22 and 178.87 gigabases of sequence, and ~288× and 248× genome coverage, to a pecan cultivar (“Pawnee”) and a domesticated Chinese hickory landrace (ZAFU-1), respectively. The total assembly size is 651.31 megabases (Mb) for pecan and 706.43 Mb for Chinese hickory. Two genome duplication events before the divergence from walnut were found in these species. Gene family analysis highlighted key genes in biotic and abiotic tolerance, oil, polyphenols, essential amino acids, and B vitamins. Further analyses of reduced-coverage genome sequences of 16 Carya and 2 Juglans species provide additional phylogenetic perspective on crop wild relatives.Cooperative characterization of these valuable resources provides a window to their evolutionary development and a valuable foundation for future crop improvement. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020

Chromosome-scale genome assembly of kiwifruit Actinidia eriantha with single-molecule sequencing and chromatin interaction mapping.

Kiwifruit (Actinidia spp.) is a dioecious plant with fruits containing abundant vitamin C and minerals. A handful of kiwifruit species have been domesticated, among which Actinidiaeriantha is increasingly favored in breeding owing to its superior commercial traits. Recently, elite cultivars from A. eriantha have been successfully selected and further studies on their biology and breeding potential require genomic information, which is currently unavailable.We assembled a chromosome-scale genome sequence of A. eriantha cultivar White using single-molecular sequencing and chromatin interaction map-based scaffolding. The assembly has a total size of 690.6 megabases and an N50 of 21.7 megabases. Approximately 99% of the assembly were in 29 pseudomolecules corresponding to the 29 kiwifruit chromosomes. Forty-three percent of the A. eriantha genome are repetitive sequences, and the non-repetitive part encodes 42,988 protein-coding genes, of which 39,075 have homologues from other plant species or protein domains. The divergence time between A. eriantha and its close relative Actinidia chinensis is estimated to be 3.3 million years, and after diversification, 1,727 and 1,506 gene families are expanded and contracted in A. eriantha, respectively.We provide a high-quality reference genome for kiwifruit A. eriantha. This chromosome-scale genome assembly is substantially better than 2 published kiwifruit assemblies from A. chinensis in terms of genome contiguity and completeness. The availability of the A. eriantha genome provides a valuable resource for facilitating kiwifruit breeding and studies of kiwifruit biology. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020

Cellular Dynamics and Genomic Identity of Centromeres in Cereal Blast Fungus.

Precise kinetochore-microtubule interactions ensure faithful chromosome segregation in eukaryotes. Centromeres, identified as scaffolding sites for kinetochore assembly, are among the most rapidly evolving chromosomal loci in terms of the DNA sequence and length and organization of intrinsic elements. Neither the centromere structure nor the kinetochore dynamics is well studied in plant-pathogenic fungi. Here, we sought to understand the process of chromosome segregation in the rice blast fungus Magnaporthe oryzae High-resolution imaging of green fluorescent protein (GFP)-tagged inner kinetochore proteins CenpA and CenpC revealed unusual albeit transient declustering of centromeres just before anaphase separation of chromosomes in M. oryzae Strikingly, the declustered centromeres positioned randomly at the spindle midzone without an apparent metaphase plate per se Using CenpA chromatin immunoprecipitation followed by deep sequencing, all seven centromeres in M. oryzae were found to be regional, spanning 57-kb to 109-kb transcriptionally poor regions. Highly AT-rich and heavily methylated DNA sequences were the only common defining features of all the centromeres in rice blast. Lack of centromere-specific DNA sequence motifs or repetitive elements suggests an epigenetic specification of centromere function in M. oryzae PacBio genome assemblies and synteny analyses facilitated comparison of the centromeric/pericentromeric regions in distinct isolates of rice blast and wheat blast and in Magnaporthiopsis poae Overall, this study revealed unusual centromere dynamics and precisely identified the centromere loci in the top model fungal pathogens that belong to Magnaporthales and cause severe losses in the global production of food crops and turf grasses.IMPORTANCEMagnaporthe oryzae is an important fungal pathogen that causes a loss of 10% to 30% of the annual rice crop due to the devastating blast disease. In most organisms, kinetochores are clustered together or arranged at the metaphase plate to facilitate synchronized anaphase separation of sister chromatids in mitosis. In this study, we showed that the initially clustered kinetochores separate and position randomly prior to anaphase in M. oryzae Centromeres in M. oryzae occupy large genomic regions and form on AT-rich DNA without any common sequence motifs. Overall, this study identified atypical kinetochore dynamics and mapped functional centromeres in M. oryzae to define the roles of centromeric and pericentric boundaries in kinetochore assembly on epigenetically specified centromere loci. This study should pave the way for further understanding of the contribution of heterochromatin in genome stability and virulence of the blast fungus and its related species of high economic importance.Copyright © 2019 Yadav et al.


April 21, 2020

Harnessing long-read amplicon sequencing to uncover NRPS and Type I PKS gene sequence diversity in polar desert soils.

The severity of environmental conditions at Earth’s frigid zones present attractive opportunities for microbial biomining due to their heightened potential as reservoirs for novel secondary metabolites. Arid soil microbiomes within the Antarctic and Arctic circles are remarkably rich in Actinobacteria and Proteobacteria, bacterial phyla known to be prolific producers of natural products. Yet the diversity of secondary metabolite genes within these cold, extreme environments remain largely unknown. Here, we employed amplicon sequencing using PacBio RS II, a third generation long-read platform, to survey over 200 soils spanning twelve east Antarctic and high Arctic sites for natural product-encoding genes, specifically targeting non-ribosomal peptides (NRPS) and Type I polyketides (PKS). NRPS-encoding genes were more widespread across the Antarctic, whereas PKS genes were only recoverable from a handful of sites. Many recovered sequences were deemed novel due to their low amino acid sequence similarity to known protein sequences, particularly throughout the east Antarctic sites. Phylogenetic analysis revealed that a high proportion were most similar to antifungal and biosurfactant-type clusters. Multivariate analysis showed that soil fertility factors of carbon, nitrogen and moisture displayed significant negative relationships with natural product gene richness. Our combined results suggest that secondary metabolite production is likely to play an important physiological component of survival for microorganisms inhabiting arid, nutrient-starved soils. © FEMS 2019.


April 21, 2020

Chromosome-scale assemblies reveal the structural evolution of African cichlid genomes.

African cichlid fishes are well known for their rapid radiations and are a model system for studying evolutionary processes. Here we compare multiple, high-quality, chromosome-scale genome assemblies to elucidate the genetic mechanisms underlying cichlid diversification and study how genome structure evolves in rapidly radiating lineages.We re-anchored our recent assembly of the Nile tilapia (Oreochromis niloticus) genome using a new high-density genetic map. We also developed a new de novo genome assembly of the Lake Malawi cichlid, Metriaclima zebra, using high-coverage Pacific Biosciences sequencing, and anchored contigs to linkage groups (LGs) using 4 different genetic maps. These new anchored assemblies allow the first chromosome-scale comparisons of African cichlid genomes. Large intra-chromosomal structural differences (~2-28 megabase pairs) among species are common, while inter-chromosomal differences are rare (<10 megabase pairs total). Placement of the centromeres within the chromosome-scale assemblies identifies large structural differences that explain many of the karyotype differences among species. Structural differences are also associated with unique patterns of recombination on sex chromosomes. Structural differences on LG9, LG11, and LG20 are associated with reduced recombination, indicative of inversions between the rock- and sand-dwelling clades of Lake Malawi cichlids. M. zebra has a larger number of recent transposable element insertions compared with O. niloticus, suggesting that several transposable element families have a higher rate of insertion in the haplochromine cichlid lineage.This study identifies novel structural variation among East African cichlid genomes and provides a new set of genomic resources to support research on the mechanisms driving cichlid adaptation and speciation. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020

Detection of VIM-1-Producing Enterobacter cloacae and Salmonella enterica Serovars Infantis and Goldcoast at a Breeding Pig Farm in Germany in 2017 and Their Molecular Relationship to Former VIM-1-Producing S. Infantis Isolates in German Livestock Production.

In 2011, VIM-1-producing Salmonella enterica serovar Infantis and Escherichia coli were isolated for the first time in four German livestock farms. In 2015/2016, highly related isolates were identified in German pig production. This raised the issue of potential reservoirs for these isolates, the relation of their mobile genetic elements, and potential links between the different affected farms/facilities. In a piglet-producing farm suspicious for being linked to some blaVIM-1 findings in Germany, fecal and environmental samples were examined for the presence of carbapenemase-producing Enterobacteriaceae and Salmonella spp. Newly discovered isolates were subjected to Illumina whole-genome sequencing (WGS) and S1 pulsed-field gel electrophoresis (PFGE) hybridization experiments. WGS data of these isolates were compared with those for the previously isolated VIM-1-producing Salmonella Infantis isolates from pigs and poultry. Among 103 samples, one Salmonella Goldcoast isolate, one Salmonella Infantis isolate, and one Enterobacter cloacae isolate carrying the blaVIM-1 gene were detected. Comparative WGS analysis revealed that the blaVIM-1 gene was part of a particular Tn21-like transposable element in all isolates. It was located on IncHI2 (ST1) plasmids of ~290 to 300?kb with a backbone highly similar (98 to 100%) to that of reference pSE15-SA01028. SNP analysis revealed a close relationship of all VIM-1-positive S Infantis isolates described since 2011. The findings of this study demonstrate that the occurrence of the blaVIM-1 gene in German livestock is restricted neither to a certain bacterial species nor to a certain Salmonella serovar but is linked to a particular Tn21-like transposable element located on transferable pSE15-SA01028-like IncHI2 (ST1) plasmids, being present in all of the investigated isolates from 2011 to 2017.IMPORTANCE Carbapenems are considered one of few remaining treatment options against multidrug-resistant Gram-negative pathogens in human clinical settings. The occurrence of carbapenemase-producing Enterobacteriaceae in livestock and food is a major public health concern. Particularly the occurrence of VIM-1-producing Salmonella Infantis in livestock farms is worrisome, as this zoonotic pathogen is one of the main causes for human salmonellosis in Europe. Investigations on the epidemiology of those carbapenemase-producing isolates and associated mobile genetic elements through an in-depth molecular characterization are indispensable to understand the transmission of carbapenemase-producing Enterobacteriaceae along the food chain and between different populations to develop strategies to prevent their further spread.Copyright © 2019 Roschanski et al.


April 21, 2020

A chromosomal-scale genome assembly of Tectona grandis reveals the importance of tandem gene duplication and enables discovery of genes in natural product biosynthetic pathways.

Teak, a member of the Lamiaceae family, produces one of the most expensive hardwoods in the world. High demand coupled with deforestation have caused a decrease in natural teak forests, and future supplies will be reliant on teak plantations. Hence, selection of teak tree varieties for clonal propagation with superior growth performance is of great importance, and access to high-quality genetic and genomic resources can accelerate the selection process by identifying genes underlying desired traits.To facilitate teak research and variety improvement, we generated a highly contiguous, chromosomal-scale genome assembly using high-coverage Pacific Biosciences long reads coupled with high-throughput chromatin conformation capture. Of the 18 teak chromosomes, we generated 17 near-complete pseudomolecules with one chromosome present as two chromosome arm scaffolds. Genome annotation yielded 31,168 genes encoding 46,826 gene models, of which, 39,930 and 41,155 had Pfam domain and expression evidence, respectively. We identified 14 clusters of tandem-duplicated terpene synthases (TPSs), genes central to the biosynthesis of terpenes, which are involved in plant defense and pollinator attraction. Transcriptome analysis revealed 10 TPSs highly expressed in woody tissues, of which, 8 were in tandem, revealing the importance of resolving tandemly duplicated genes and the quality of the assembly and annotation. We also validated the enzymatic activity of four TPSs to demonstrate the function of key TPSs.In summary, this high-quality chromosomal-scale assembly and functional annotation of the teak genome will facilitate the discovery of candidate genes related to traits critical for sustainable production of teak and for anti-insecticidal natural products. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020

Complete chloroplast genome sequences of Kaempferia galanga and Kaempferia elegans: Molecular structures and comparative analysis.

Kaempferia galanga and Kaempferia elegans, which belong to the genus Kaempferia family Zingiberaceae, are used as valuable herbal medicine and ornamental plants, respectively. The chloroplast genomes have been used for molecular markers, species identification and phylogenetic studies. In this study, the complete chloroplast genome sequences of K. galanga and K. elegans are reported. Results show that the complete chloroplast genome of K. galanga is 163,811 bp long, having a quadripartite structure with large single copy (LSC) of 88,405 bp and a small single copy (SSC) of 15,812 bp separated by inverted repeats (IRs) of 29,797 bp. Similarly, the complete chloroplast genome of K. elegans is 163,555 bp long, having a quadripartite structure in which IRs of 29,773 bp length separates 88,020 bp of LSC and 15,989 bp of SSC. A total of 111 genes in K. galanga and 113 genes in K. elegans comprised 79 protein-coding genes and 4 ribosomal RNA (rRNA) genes, as well as 28 and 30 transfer RNA (tRNA) genes in K. galanga and K. elegans, respectively. The gene order, GC content and orientation of the two Kaempferia chloroplast genomes exhibited high similarity. The location and distribution of simple sequence repeats (SSRs) and long repeat sequences were determined. Eight highly variable regions between the two Kaempferia species were identified and 643 mutation events, including 536 single-nucleotide polymorphisms (SNPs) and 107 insertion/deletions (indels), were accurately located. Sequence divergences of the whole chloroplast genomes were calculated among related Zingiberaceae species. The phylogenetic analysis based on SNPs among eleven species strongly supported that K. galanga and K. elegans formed a cluster within Zingiberaceae. This study identified the unique characteristics of the entire K. galanga and K. elegans chloroplast genomes that contribute to our understanding of the chloroplast DNA evolution within Zingiberaceae species. It provides valuable information for phylogenetic analysis and species identification within genus Kaempferia.


April 21, 2020

A critical comparison of technologies for a plant genome sequencing project.

A high-quality genome sequence of any model organism is an essential starting point for genetic and other studies. Older clone-based methods are slow and expensive, whereas faster, cheaper short-read-only assemblies can be incomplete and highly fragmented, which minimizes their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and associated new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, on larger (e.g., human) genomes. However, plant genomes can be much more repetitive and larger than the human genome, and plant biochemistry often makes obtaining high-quality DNA that is free from contaminants difficult. Reflecting their challenging nature, we observe that plant genome assembly statistics are typically poorer than for vertebrates.Here, we compare Illumina short read, Pacific Biosciences long read, 10x Genomics linked reads, Dovetail Hi-C, and BioNano Genomics optical maps, singly and combined, in producing high-quality long-range genome assemblies of the potato species Solanum verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA compute requirements and sequencing costs.The field of genome sequencing and assembly is reaching maturity, and the differences we observe between assemblies are surprisingly small. We expect that our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers. © The Author(s) 2019. Published by Oxford University Press.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.