Menu
July 7, 2019

A draft genome of field pennycress (Thlaspi arvense) provides tools for the domestication of a new winter biofuel crop.

Field pennycress (Thlaspi arvense L.) is being domesticated as a new winter cover crop and biofuel species for the Midwestern United States that can be double-cropped between corn and soybeans. A genome sequence will enable the use of new technologies to make improvements in pennycress. To generate a draft genome, a hybrid sequencing approach was used to generate 47 Gb of DNA sequencing reads from both the Illumina and PacBio platforms. These reads were used to assemble 6,768 genomic scaffolds. The draft genome was annotated using the MAKER pipeline, which identified 27,390 predicted protein-coding genes, with almost all of these predicted peptides having significant sequence similarity to Arabidopsis proteins. A comprehensive analysis of pennycress gene homologues involved in glucosinolate biosynthesis, metabolism, and transport pathways revealed high sequence conservation compared with other Brassicaceae species, and helps validate the assembly of the pennycress gene space in this draft genome. Additional comparative genomic analyses indicate that the knowledge gained from years of basic Brassicaceae research will serve as a powerful tool for identifying gene targets whose manipulation can be predicted to result in improvements for pennycress. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


July 7, 2019

The genome of Dendrobium officinale illuminates the biology of the important traditional Chinese orchid herb.

Dendrobium officinale Kimura et Migo is a traditional Chinese orchid herb that has both ornamental value and a broad range of therapeutic effects. Here, we report the first de novo assembled 1.35 Gb genome sequences for D. officinale by combining the second-generation Illumina Hiseq 2000 and third-generation PacBio sequencing technologies. We found that orchids have a complete inflorescence gene set and have some specific inflorescence genes. We observed gene expansion in gene families related to fungus symbiosis and drought resistance. We analyzed biosynthesis pathways of medicinal components of D. officinale and found extensive duplication of SPS and SuSy genes, which are related to polysaccharide generation, and that the pathway of D. officinale alkaloid synthesis could be extended to generate 16-epivellosimine. The D. officinale genome assembly demonstrates a new approach to deciphering large complex genomes and, as an important orchid species and a traditional Chinese medicine, the D. officinale genome will facilitate future research on the evolution of orchid plants, as well as the study of medicinal components and potential genetic breeding of the dendrobe. Copyright © 2015 The Author. Published by Elsevier Inc. All rights reserved.


July 7, 2019

Saccharina genomes provide novel insight into kelp biology.

Seaweeds are essential for marine ecosystems and have immense economic value. Here we present a comprehensive analysis of the draft genome of Saccharina japonica, one of the most economically important seaweeds. The 537-Mb assembled genomic sequence covered 98.5% of the estimated genome, and 18,733 protein-coding genes are predicted and annotated. Gene families related to cell wall synthesis, halogen concentration, development and defence systems were expanded. Functional diversification of the mannuronan C-5-epimerase and haloperoxidase gene families provides insight into the evolutionary adaptation of polysaccharide biosynthesis and iodine antioxidation. Additional sequencing of seven cultivars and nine wild individuals reveal that the genetic diversity within wild populations is greater than among cultivars. All of the cultivars are descendants of a wild S. japonica accession showing limited admixture with S. longissima. This study represents an important advance toward improving yields and economic traits in Saccharina and provides an invaluable resource for plant genome studies.


July 7, 2019

It’s more than stamp collecting: how genome sequencing can unify biological research.

The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, while the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to ‘big science’ survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need. Copyright © 2015 Elsevier Ltd. All rights reserved.


July 7, 2019

Diversity and evolution of centromere repeats in the maize genome.

Centromere repeats are found in most eukaryotes and play a critical role in kinetochore formation. Though centromere repeats exhibit considerable diversity both within and among species, little is understood about the mechanisms that drive centromere repeat evolution. Here, we use maize as a model to investigate how a complex history involving polyploidy, fractionation, and recent domestication has impacted the diversity of the maize centromeric repeat CentC. We first validate the existence of long tandem arrays of repeats in maize and other taxa in the genus Zea. Although we find considerable sequence diversity among CentC copies genome-wide, genetic similarity among repeats is highest within these arrays, suggesting that tandem duplications are the primary mechanism for the generation of new copies. Nonetheless, clustering analyses identify similar sequences among distant repeats, and simulations suggest that this pattern may be due to homoplasious mutation. Although the two ancestral subgenomes of maize have contributed nearly equal numbers of centromeres, our analysis shows that the majority of all CentC repeats derive from one of the parental genomes, with an even stronger bias when examining the largest assembled contiguous clusters. Finally, by comparing maize with its wild progenitor teosinte, we find that the abundance of CentC likely decreased after domestication, while the pericentromeric repeat Cent4 has drastically increased.


July 7, 2019

GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments.

Genome assemblies generated with next-generation sequencing (NGS) reads usually contain a number of gaps. Several tools have recently been developed to close the gaps in these assemblies with NGS reads. Although these gap-closing tools efficiently close the gaps, they entail a high rate of misassembly at gap-closing sites.We have found that the assembly error rates caused by these tools are 20-500-fold higher than the rate of errors introduced into contigs by de novo assemblers. We here describe GMcloser, a tool that accurately closes these gaps with a preassembled contig set or a long read set (i.e. error-corrected PacBio reads). GMcloser uses likelihood-based classifiers calculated from the alignment statistics between scaffolds, contigs and paired-end reads to correctly assign contigs or long reads to gap regions of scaffolds, thereby achieving accurate and efficient gap closure. We demonstrate with sequencing data from various organisms that the gap-closing accuracy of GMcloser is 3-100-fold higher than those of other available tools, with similar efficiency.GMcloser and an accompanying tool (GMvalue) for evaluating the assembly and correcting misassemblies except SNPs and short indels in the assembly are available at https://sourceforge.net/projects/gmcloser/.shunichi.kosugi@riken.jpSupplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Gossypium barbadense genome sequence provides insight into the evolution of extra-long staple fiber and specialized metabolites.

Of the two cultivated species of allopolyploid cotton, Gossypium barbadense produces extra-long fibers for the production of superior textiles. We sequenced its genome (AD)2 and performed a comparative analysis. We identified three bursts of retrotransposons from 20 million years ago (Mya) and a genome-wide uneven pseudogenization peak at 11-20 Mya, which likely contributed to genomic divergences. Among the 2,483 genes preferentially expressed in fiber, a cell elongation regulator, PRE1, is strikingly At biased and fiber specific, echoing the A-genome origin of spinnable fiber. The expansion of the PRE members implies a genetic factor that underlies fiber elongation. Mature cotton fiber consists of nearly pure cellulose. G. barbadense and G. hirsutum contain 29 and 30 cellulose synthase (CesA) genes, respectively; whereas most of these genes (>25) are expressed in fiber, genes for secondary cell wall biosynthesis exhibited a delayed and higher degree of up-regulation in G. barbadense compared with G. hirsutum, conferring an extended elongation stage and highly active secondary wall deposition during extra-long fiber development. The rapid diversification of sesquiterpene synthase genes in the gossypol pathway exemplifies the chemical diversity of lineage-specific secondary metabolites. The G. barbadense genome advances our understanding of allopolyploidy, which will help improve cotton fiber quality.


July 7, 2019

The genus Brachypodium as a model for perenniality and polyploidy

The genus Brachypodium contains annual and perennial species with both diploid and polyploid genomes. Like the annual species B. distachyon, some of the perennial and polyploid species have traits compatible with use as a model system (e.g. small genomes, rapid generation time, self-fertile and easy to grow). Thus, there is an opportunity to leverage the resources and knowledge developed for B. distachyon to use other Brachypodium species as models for perenniality and the regulation and evolution of polyploid genomes. There are two factors driving an increased interest in perenniality. First, several perennial grasses are being developed as biomass crops for the sustainable production of biofuel and it would be useful to have a perennial model system to rapidly test biotechnological crop improvement strategies for undesirable impacts on perenniality and winter hardiness. In addition, a deeper understanding of the molecular mechanisms underlying perenniality could be used to design strategies for improving energy crops, for example, by changing resource allocation during growth or by altering the onset of dormancy. The second factor driving increased interest in perenniality is the potential environmental benefits of developing perennial grain crops. B. sylvaticum is a perennial with attributes suitable for use as a perennial model system. A high efficiency transformation system has been developed and a genome sequencing project is underway. Since many important crops, including emerging biomass crops, are polyploid, there is a pressing need to understand the rules governing the evolution and regulation of polyploid genomes. Unfortunately, it is difficult to study polyploid crop genomes because of their size and the difficulty of manipulating those plants in the laboratory. By contrast, B. hybridum has a small polyploid genome and is easy to work with in the laboratory. In addition, analysis of the B. hybridum genome, will be greatly aided by the genome sequences of the two extant diploid species (B. distachyon and B. stacei) that apparently gave rise to B. hybridum. Availability of high quality reference genomes for these three species will be a powerful resource for the study of polyploidy.


July 7, 2019

A synteny-based draft genome sequence of the forage grass Lolium perenne.

Here we report the draft genome sequence of perennial ryegrass (Lolium perenne), an economically important forage and turf grass species that is widely cultivated in temperate regions worldwide. It is classified along with wheat, barley, oats and Brachypodium distachyon in the Pooideae sub-family of the grass family (Poaceae). Transcriptome data was used to identify 28 455 gene models, and we utilized macro-co-linearity between perennial ryegrass and barley, and synteny within the grass family, to establish a synteny-based linear gene order. The gametophytic self-incompatibility mechanism enables the pistil of a plant to reject self-pollen and therefore promote out-crossing. We have used the sequence assembly to characterize transcriptional changes in the stigma during pollination with both compatible and incompatible pollen. Characterization of the pollen transcriptome identified homologs to pollen allergens from a range of species, many of which were expressed to very high levels in mature pollen grains, and are potentially involved in the self-incompatibility mechanism. The genome sequence provides a valuable resource for future breeding efforts based on genomic prediction, and will accelerate the development of new varieties for more productive grasslands.© 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.


July 7, 2019

Jitterbug: somatic and germline transposon insertion detection at single-nucleotide resolution.

Transposable elements are major players in genome evolution. Transposon insertion polymorphisms can translate into phenotypic differences in plants and animals and are linked to different diseases including human cancer, making their characterization highly relevant to the study of genome evolution and genetic diseases. Here we present Jitterbug, a novel tool that identifies transposable element insertion sites at single-nucleotide resolution based on the pairedend mapping and clipped-read signatures produced by NGS alignments. Jitterbug can be easily integrated into existing NGS analysis pipelines, using the standard BAM format produced by frequently applied alignment tools (e.g. bwa, bowtie2), with no need to realign reads to a set of consensus transposon sequences. Jitterbug is highly sensitive and able to recall transposon insertions with a very high specificity, as demonstrated by benchmarks in the human and Arabidopsis genomes, and validation using long PacBio reads. In addition, Jitterbug estimates the zygosity of transposon insertions with high accuracy and can also identify somatic insertions. We demonstrate that Jitterbug can identify mosaic somatic transposon movement using sequenced tumor-normal sample pairs and allows for estimating the cancer cell fraction of clones containing a somatic TE insertion. We suggest that the independent methods we use to evaluate performance are a step towards creating a gold standard dataset for benchmarking structural variant prediction tools.


July 7, 2019

The Brachypodium distachyon reference genome

Grasses provide the bulk of human calories but improvement in grass yields is hindered by the characteristically large and complex genomes of these species; the genomes of wheat, maize, and sugar cane are 17,000, 2300, and 10,000 Mb, respectively. Brachypodium distachyon has one of the smallest genomes of all grasses at 272 Mb, and a number of key traits that make it a good model grass. Brachypodium was the fourth sequenced grass genome, after rice, Sorghum, and maize, and was the first sequenced in the Pooideae subfamily, a diverse group that includes wheat, barley, oat, and rye. The Brachypodium genome was sequenced using a whole genome shotgun approach with Sanger sequencing and is nearly complete with 99.6 % of the sequences anchored to five chromosomes. Sequencing of Brachypodium enabled comparative genomic analysis of grass genomes and shed light on processes involved in chromosome fusions and maintenance of a small genome. The high-quality Brachypodium genome sequence provides a framework for gene expression atlases, resequencing, quantitative trait loci (QTL) mapping, GWAS, and ENCODE datasets. The wealth of Brachypodium genomic resources have cemented its utility as a model organism and will facilitate translational work for improving the grasses that feed the world.


July 7, 2019

Hybrid de novo genome assembly of the Chinese herbal plant danshen (Salvia miltiorrhiza Bunge)

Danshen (Salvia miltiorrhiza Bunge), also known as Chinese red sage, is a member of Lamiaceae family. It is valued in traditional Chinese medicine, primarily for the treatment of cardiovascular and cerebrovascular diseases. Because of its pharmacological potential, ongoing research aims to identify novel bioactive compounds in danshen, and their biosynthetic pathways. To date, only expressed sequence tag (EST) and RNA-seq data for this herbal plant are available to the public. We therefore propose that the construction of a reference genome for danshen will help elucidate the biosynthetic pathways of important secondary metabolites, thereby advancing the investigation of novel drugs from this plant.


July 7, 2019

Single molecule sequencing of THCA synthase reveals copy number variation in modern drug-type Cannabis sativa L.

Cannabinoid expression is an important genetically determined feature of cannabis that presents clinical and legal implications for patients seeking cannabinoid specific therapies like Cannabidiol (CBD). Cannabinoid, terpenoid, and flavonoid marker assisted selection can accelerate breeding efforts by offering genetic tools to select for desired traits at an early stage in growth. To this end, multiple models for chemotype inheritance have been described suggesting a complex picture for chemical phenotype determination. Here we explore the potential role of copy number variation of THCA Synthase using phased single molecule sequencing and demonstrate that copy number and sequence variation of this gene is common and suggests a more nuanced view of chemotype prediction.


July 7, 2019

Leafy spurge genomics: A model perennial weed to investigate development, stress responses, and invasiveness

Leafy spurge is wild flower native to Europe that has become an invasive perennial weed in the northern great plains of the USA and Canada. Leafy spurge primarily infests range and recreation lands and costs US land managers millions dollars annually. In its invaded range, leafy spurge can form vast monocultures that significantly impact native flora and fauna and has been attributed to reduced populations of endangered species such as the prairie fringed orchid. Leafy spurge has remarkable plasticity and can persist under environmental extremes—primarily due to the formation of hundreds of underground adventitious buds that can form on its extensive and deep root system. We have developed genomics-based tools to assist our investigations related to vegetative production from these underground buds, as well as its responses to stress, and the potential mechanisms leading to the invasiveness of leafy spurge. Towards these ends, we have utilized Sanger-based sequencing to develop EST-databases from leafy spurge and cassava (a related species) transcriptomes, and developed textasciitilde23,000 element cDNA microarrays representing all of the unigenes identified in these databases. Additionally, numerous cDNA libraries and genomic libraries have been developed including bacterial artificial chromosome libraries useful for identifying and characterizing promoters of differentially expressed genes. Finally, to enhance our ability to identify promoter sequences and transcription factors involved in vegetative production, stress responses, and invasiveness, we have incorporated next generation sequencing approaches to fully sequence the leafy spurge genome. Using global transcriptome profiles, next generation sequencing, bioinformatics programs has provided insights into molecular mechanisms and regulatory pathways that make leafy spurge a particularly invasive and difficult weed to control.


July 7, 2019

Compact genome of the Antarctic midge is likely an adaptation to an extreme environment.

The midge, Belgica antarctica, is the only insect endemic to Antarctica, and thus it offers a powerful model for probing responses to extreme temperatures, freeze tolerance, dehydration, osmotic stress, ultraviolet radiation and other forms of environmental stress. Here we present the first genome assembly of an extremophile, the first dipteran in the family Chironomidae, and the first Antarctic eukaryote to be sequenced. At 99 megabases, B. antarctica has the smallest insect genome sequenced thus far. Although it has a similar number of genes as other Diptera, the midge genome has very low repeat density and a reduction in intron length. Environmental extremes appear to constrain genome architecture, not gene content. The few transposable elements present are mainly ancient, inactive retroelements. An abundance of genes associated with development, regulation of metabolism and responses to external stimuli may reflect adaptations for surviving in this harsh environment.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.