Menu
April 21, 2020  |  

Genome sequence of Malania oleifera, a tree with great value for nervonic acid production.

Malania oleifera, a member of the Olacaceae family, is an IUCN red listed tree, endemic and restricted to the Karst region of southwest China. This tree’s seed is valued for its high content of precious fatty acids (especially nervonic acid). However, studies on its genetic makeup and fatty acid biogenesis are severely hampered by a lack of molecular and genetic tools.We generated 51 Gb and 135 Gb of raw DNA sequences, using Pacific Biosciences (PacBio) single-molecule real-time and 10× Genomics sequencing, respectively. A final genome assembly, with a scaffold N50 size of 4.65 Mb and a total length of 1.51 Gb, was obtained by primary assembly based on PacBio long reads plus scaffolding with 10× Genomics reads. Identified repeats constituted ~82% of the genome, and 24,064 protein-coding genes were predicted with high support. The genome has low heterozygosity and shows no evidence for recent whole genome duplication. Metabolic pathway genes relating to the accumulation of long-chain fatty acid were identified and studied in detail.Here, we provide the first genome assembly and gene annotation for M. oleifera. The availability of these resources will be of great importance for conservation biology and for the functional genomics of nervonic acid biosynthesis. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020  |  

Genome assembly and annotation of the Trichoplusia ni Tni-FNL insect cell line enabled by long-read technologies.

Trichoplusiani derived cell lines are commonly used to enable recombinant protein expression via baculovirus infection to generate materials approved for clinical use and in clinical trials. In order to develop systems biology and genome engineering tools to improve protein expression in this host, we performed de novo genome assembly of the Trichoplusiani-derived cell line Tni-FNL.By integration of PacBio single-molecule sequencing, Bionano optical mapping, and 10X Genomics linked-reads data, we have produced a draft genome assembly of Tni-FNL.Our assembly contains 280 scaffolds, with a N50 scaffold size of 2.3 Mb and a total length of 359 Mb. Annotation of the Tni-FNL genome resulted in 14,101 predicted genes and 93.2% of the predicted proteome contained recognizable protein domains. Ortholog searches within the superorder Holometabola provided further evidence of high accuracy and completeness of the Tni-FNL genome assembly.This first draft Tni-FNL genome assembly was enabled by complementary long-read technologies and represents a high-quality, well-annotated genome that provides novel insight into the complexity of this insect cell line and can serve as a reference for future large-scale genome engineering work in this and other similar recombinant protein production hosts.


April 21, 2020  |  

LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly.

Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes.We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome.LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.


April 21, 2020  |  

Critical length in long-read resequencing

Long-read sequencing has substantial advantages for structural variant discovery and phasing of vari- ants compared to short-read technologies, but the required and optimal read length has not been as- sessed. In this work, we used long reads simulated from human genomes and evaluated structural vari- ant discovery and variant phasing using current best practicebioinformaticsmethods.Wedeterminedthatoptimal discovery of structural variants from human genomes can be obtained with reads of minimally 20 kb. Haplotyping variants across genes only reaches its optimum from reads of 100 kb. These findings are important for the design of future long-read sequenc- ing projects.


April 21, 2020  |  

De novo assembly of the goldfish (Carassius auratus) genome and the evolution of genes after whole-genome duplication.

For over a thousand years, the common goldfish (Carassius auratus) was raised throughout Asia for food and as an ornamental pet. As a very close relative of the common carp (Cyprinus carpio), goldfish share the recent genome duplication that occurred approximately 14 million years ago in their common ancestor. The combination of centuries of breeding and a wide array of interesting body morphologies provides an exciting opportunity to link genotype to phenotype and to understand the dynamics of genome evolution and speciation. We generated a high-quality draft sequence and gene annotations of a “Wakin” goldfish using 71X PacBio long reads. The two subgenomes in goldfish retained extensive synteny and collinearity between goldfish and zebrafish. However, genes were lost quickly after the carp whole-genome duplication, and the expression of 30% of the retained duplicated gene diverged substantially across seven tissues sampled. Loss of sequence identity and/or exons determined the divergence of the expression levels across all tissues, while loss of conserved noncoding elements determined expression variance between different tissues. This assembly provides an important resource for comparative genomics and understanding the causes of goldfish variants.


April 21, 2020  |  

Effector gene reshuffling involves dispensable mini-chromosomes in the wheat blast fungus.

Newly emerged wheat blast disease is a serious threat to global wheat production. Wheat blast is caused by a distinct, exceptionally diverse lineage of the fungus causing rice blast disease. Through sequencing a recent field isolate, we report a reference genome that includes seven core chromosomes and mini-chromosome sequences that harbor effector genes normally found on ends of core chromosomes in other strains. No mini-chromosomes were observed in an early field strain, and at least two from another isolate each contain different effector genes and core chromosome end sequences. The mini-chromosome is enriched in transposons occurring most frequently at core chromosome ends. Additionally, transposons in mini-chromosomes lack the characteristic signature for inactivation by repeat-induced point (RIP) mutation genome defenses. Our results, collectively, indicate that dispensable mini-chromosomes and core chromosomes undergo divergent evolutionary trajectories, and mini-chromosomes and core chromosome ends are coupled as a mobile, fast-evolving effector compartment in the wheat pathogen genome.


April 21, 2020  |  

A High-Quality Grapevine Downy Mildew Genome Assembly Reveals Rapidly Evolving and Lineage-Specific Putative Host Adaptation Genes.

Downy mildews are obligate biotrophic oomycete pathogens that cause devastating plant diseases on economically important crops. Plasmopara viticola is the causal agent of grapevine downy mildew, a major disease in vineyards worldwide. We sequenced the genome of Pl. viticola with PacBio long reads and obtained a new 92.94?Mb assembly with high contiguity (359 scaffolds for a N50 of 706.5?kb) due to a better resolution of repeat regions. This assembly presented a high level of gene completeness, recovering 1,592 genes encoding secreted proteins involved in plant-pathogen interactions. Plasmopara viticola had a two-speed genome architecture, with secreted protein-encoding genes preferentially located in gene-sparse, repeat-rich regions and evolving rapidly, as indicated by pairwise dN/dS values. We also used short reads to assemble the genome of Plasmopara muralis, a closely related species infecting grape ivy (Parthenocissus tricuspidata). The lineage-specific proteins identified by comparative genomics analysis included a large proportion of RxLR cytoplasmic effectors and, more generally, genes with high dN/dS values. We identified 270 candidate genes under positive selection, including several genes encoding transporters and components of the RNA machinery potentially involved in host specialization. Finally, the Pl. viticola genome assembly generated here will allow the development of robust population genomics approaches for investigating the mechanisms involved in adaptation to biotic and abiotic selective pressures in this species. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


April 21, 2020  |  

The Draft Genome of an Octocoral, Dendronephthya gigantea.

Coral reefs composed of stony corals are threatened by global marine environmental changes. However, soft coral communities of octocorallian species, appear more resilient. The genomes of several cnidarians species have been published, including from stony corals, sea anemones, and hydra. To fill the phylogenetic gap for octocoral species of cnidarians, we sequenced the octocoral, Dendronephthya gigantea, a nonsymbiotic soft coral, commonly known as the carnation coral. The D. gigantea genome size is ~276?Mb. A high-quality genome assembly was constructed from PacBio long reads (29.85 Gb with 108× coverage) and Illumina short paired-end reads (35.54 Gb with 128× coverage) resulting in the highest N50 value (1.4?Mb) reported thus far among cnidarian genomes. About 12% of the genome is repetitive elements and contained 28,879 predicted protein-coding genes. This gene set is composed of 94% complete BUSCO ortholog benchmark genes, which is the second highest value among the cnidarians, indicating high quality. Based on molecular phylogenetic analysis, octocoral and hexacoral divergence times were estimated at 544 MYA. There is a clear difference in Hox gene composition between these species: unlike hexacorals, the Antp superclass Evx gene was absent in D. gigantea. Here, we present the first genome assembly of a nonsymbiotic octocoral, D. gigantea to aid in the comparative genomic analysis of cnidarians, including stony and soft corals, both symbiotic and nonsymbiotic. The D. gigantea genome may also provide clues to mechanisms of differential coping between the soft and stony corals in response to scenarios of global warming. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


April 21, 2020  |  

The genome of the medicinal plant Andrographis paniculata provides insight into the biosynthesis of the bioactive diterpenoid neoandrographolide.

Andrographis paniculata is a herbaceous dicot plant widely used for its anti-inflammatory and anti-viral properties across its distribution in China, India and other Southeast Asian countries. A. paniculata was used as a crucial therapeutic treatment during the influenza epidemic of 1919 in India, and is still used for the treatment of infectious disease in China. A. paniculata produces large quantities of the anti-inflammatory diterpenoid lactones andrographolide and neoandrographolide, and their analogs, which are touted to be the next generation of natural anti-inflammatory medicines for lung diseases, hepatitis, neurodegenerative disorders, autoimmune disorders and inflammatory skin diseases. Here, we report a chromosome-scale A. paniculata genome sequence of 269 Mb that was assembled by Illumina short reads, PacBio long reads and high-confidence (Hi-C) data. Gene annotation predicted 25 428 protein-coding genes. In order to decipher the genetic underpinning of diterpenoid biosynthesis, transcriptome data from seedlings elicited with methyl jasmonate were also obtained, which enabled the identification of genes encoding diterpenoid synthases, cytochrome P450 monooxygenases, 2-oxoglutarate-dependent dioxygenases and UDP-dependent glycosyltransferases potentially involved in diterpenoid lactone biosynthesis. We further carried out functional characterization of pairs of class-I and -II diterpene synthases, revealing the ability to produce diversified labdane-related diterpene scaffolds. In addition, a glycosyltransferase able to catalyze O-linked glucosylation of andrograpanin, yielding the major active product neoandrographolide, was also identified. Thus, our results demonstrate the utility of the combined genomic and transcriptomic data set generated here for the investigation of the production of the bioactive diterpenoid lactone constituents of the important medicinal herb A. paniculata. © 2018 The Authors The Plant Journal © 2018 John Wiley & Sons Ltd.


April 21, 2020  |  

Stout camphor tree genome fills gaps in understanding of flowering plant genome evolution.

We present reference-quality genome assembly and annotation for the stout camphor tree (Cinnamomum kanehirae (Laurales, Lauraceae)), the first sequenced member of the Magnoliidae comprising four orders (Laurales, Magnoliales, Canellales and Piperales) and over 9,000 species. Phylogenomic analysis of 13 representative seed plant genomes indicates that magnoliid and eudicot lineages share more recent common ancestry than monocots. Two whole-genome duplication events were inferred within the magnoliid lineage: one before divergence of Laurales and Magnoliales and the other within the Lauraceae. Small-scale segmental duplications and tandem duplications also contributed to innovation in the evolutionary history of Cinnamomum. For example, expansion of the terpenoid synthase gene subfamilies within the Laurales spawned the diversity of Cinnamomum monoterpenes and sesquiterpenes.


April 21, 2020  |  

Complete genome sequence of Streptomyces spongiicola HNM0071T, a marine sponge-associated actinomycete producing staurosporine and echinomycin

Streptomyes spongiicola HNM0071T is a novel marine sponge-associated actinomycete with potential to produce antitumor agents including staurosporine and echinomycin. Here, we present the complete genome sequence of S. spongiicola HNM0071, which consists of a linear chromosome of 7,180,417?bp, 5669 protein coding genes, 18 rRNA genes, and 66 tRNA genes. Twenty-seven putative secondary metabolite biosynthetic gene clusters were found in the genome. Among them, the staurosporine and echinomycin gene clusters have been described completely. The complete genome information presented here will enable us to investigate the biosynthetic mechanism of two well-known antitumor antibiotics and to discover novel secondary metabolites with potential antitumor activities.


April 21, 2020  |  

Finding Nemo’s Genes: A chromosome-scale reference assembly of the genome of the orange clownfish Amphiprion percula.

The iconic orange clownfish, Amphiprion percula, is a model organism for studying the ecology and evolution of reef fishes, including patterns of population connectivity, sex change, social organization, habitat selection and adaptation to climate change. Notably, the orange clownfish is the only reef fish for which a complete larval dispersal kernel has been established and was the first fish species for which it was demonstrated that antipredator responses of reef fishes could be impaired by ocean acidification. Despite its importance, molecular resources for this species remain scarce and until now it lacked a reference genome assembly. Here, we present a de novo chromosome-scale assembly of the genome of the orange clownfish Amphiprion percula. We utilized single-molecule real-time sequencing technology from Pacific Biosciences to produce an initial polished assembly comprised of 1,414 contigs, with a contig N50 length of 1.86 Mb. Using Hi-C-based chromatin contact maps, 98% of the genome assembly were placed into 24 chromosomes, resulting in a final assembly of 908.8 Mb in length with contig and scaffold N50s of 3.12 and 38.4 Mb, respectively. This makes it one of the most contiguous and complete fish genome assemblies currently available. The genome was annotated with 26,597 protein-coding genes and contains 96% of the core set of conserved actinopterygian orthologs. The availability of this reference genome assembly as a community resource will further strengthen the role of the orange clownfish as a model species for research on the ecology and evolution of reef fishes. © 2018 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.


April 21, 2020  |  

Genome sequence of Jatropha curcas L., a non-edible biodiesel plant, provides a resource to improve seed-related traits.

Jatropha curcas (physic nut), a non-edible oilseed crop, represents one of the most promising alternative energy sources due to its high seed oil content, rapid growth and adaptability to various environments. We report ~339 Mbp draft whole genome sequence of J. curcas var. Chai Nat using both the PacBio and Illumina sequencing platforms. We identified and categorized differentially expressed genes related to biosynthesis of lipid and toxic compound among four stages of seed development. Triacylglycerol (TAG), the major component of seed storage oil, is mainly synthesized by phospholipid:diacylglycerol acyltransferase in Jatropha, and continuous high expression of homologs of oleosin over seed development contributes to accumulation of high level of oil in kernels by preventing the breakdown of TAG. A physical cluster of genes for diterpenoid biosynthetic enzymes, including casbene synthases highly responsible for a toxic compound, phorbol ester, in seed cake, was syntenically highly conserved between Jatropha and castor bean. Transcriptomic analysis of female and male flowers revealed the up-regulation of a dozen family of TFs in female flower. Additionally, we constructed a robust species tree enabling estimation of divergence times among nine Jatropha species and five commercial crops in Malpighiales order. Our results will help researchers and breeders increase energy efficiency of this important oil seed crop by improving yield and oil content, and eliminating toxic compound in seed cake for animal feed. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.


April 21, 2020  |  

Endogenous pararetrovirus sequences are widely present in Citrinae genomes.

Endogenous pararetroviruses (EPRVs) are characterized in several plant genomes and their biological effects have been reported. In this study, hundreds of EPRV segments were identified in six Citrinae genomes. A total of 1034 EPRV segments were identified in the genomes of sweet orange, 2036 in pummelo, 598 in clementine mandarin, 752 in Ichang papeda, 2060 in citron and 245 in atalantia. Genomic analysis indicated that EPRV segments tend to cluster as hot spots in the genomes, particularly on chromosome 2 and 5. Large numbers of simple repeats and transposable elements were identified in the 2-kb flanking regions of the EPRV segments. Comparative genomic analysis and PCR experiments showed that there are highly conserved EPRV segments and species-specific EPRV segments between the Citrinae genomes. Phylogenetic analysis suggested that the integration events of EPRVs could initiate in a common progenitor of Citrinae species and repeatedly occur during the Citrinae divergence.Copyright © 2018 Elsevier B.V. All rights reserved.


April 21, 2020  |  

Complete Genome Sequence of Lactic Acid Bacterium Pediococcus acidilactici Strain ATCC 8042, an Autolytic Anti-bacterial Peptidoglycan Hydrolase Producer

Pediococcus acidilactici is a probiotic bacterium that is industrially utilized in the food industry and antibiotics development. Here, we determine the complete nucleotide sequence of the genome of Pediococcus acidilactici ATCC 8042. The genome was sequenced by the PacBio RSII to generate a single contig consisting of circular chromosome sequence. Illumina MiniSeq sequencing platform and Sanger sequencing method were additionally utilized to correct errors resulting from the long-read sequencing platform. The sequence consists of 2,009,598 bp with a G + C content of 42.1% and contains 1,865 protein-coding sequences. Based on the sequence information, we could confirm and predict the presence of four peptidoglycan hydrolases by HyPe software. This work, therefore, provides the complete genomic information of P. acidilactici ATCC 8042 with a profitable potential of genome-scale comprehension of anti-pathogenic activity, which can be applied in nutraceutical and pharmaceutical biotechnology field.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.