Menu
September 22, 2019  |  

Genomic approaches for studying crop evolution.

Understanding how crop plants evolved from their wild relatives and spread around the world can inform about the origins of agriculture. Here, we review how the rapid development of genomic resources and tools has made it possible to conduct genetic mapping and population genetic studies to unravel the molecular underpinnings of domestication and crop evolution in diverse crop species. We propose three future avenues for the study of crop evolution: establishment of high-quality reference genomes for crops and their wild relatives; genomic characterization of germplasm collections; and the adoption of novel methodologies such as archaeogenetics, epigenomics, and genome editing.


September 22, 2019  |  

Repeated inversions within a pannier intron drive diversification of intraspecific colour patterns of ladybird beetles.

How genetic information is modified to generate phenotypic variation within a species is one of the central questions in evolutionary biology. Here we focus on the striking intraspecific diversity of >200 aposematic elytral (forewing) colour patterns of the multicoloured Asian ladybird beetle, Harmonia axyridis, which is regulated by a tightly linked genetic locus h. Our loss-of-function analyses, genetic association studies, de novo genome assemblies, and gene expression data reveal that the GATA transcription factor gene pannier is the major regulatory gene located at the h locus, and suggest that repeated inversions and cis-regulatory modifications at pannier led to the expansion of colour pattern variation in H. axyridis. Moreover, we show that the colour-patterning function of pannier is conserved in the seven-spotted ladybird beetle, Coccinella septempunctata, suggesting that H. axyridis’ extraordinary intraspecific variation may have arisen from ancient modifications in conserved elytral colour-patterning mechanisms in ladybird beetles.


September 22, 2019  |  

A draft genome assembly of the Chinese sillago (Sillago sinica), the first reference genome for Sillaginidae fishes.

Sillaginidae, also known as smelt-whitings, is a family of benthic coastal marine fishes in the Indo-West Pacific that have high ecological and economic importance. Many Sillaginidae species, including the Chinese sillago (Sillago sinica), have been recently described in China, providing valuable material to analyze genetic diversification of the family Sillaginidae. Here, we constructed a reference genome for the Chinese sillago, with the aim to set up a platform for comparative analysis of all species in this family.Using the single-molecule real-time DNA sequencing platform Pacific Biosciences (PacBio) Sequel, we generated ~27.3 Gb genomic DNA sequences for the Chinese sillago. We reconstructed a genome assembly of 534 Mb using a strategy that takes advantage of complementary strengths of two genome assembly programs, Canu and FALCON. The genome size was consistent with the estimated genome size based on k-mer analysis. The assembled genome consisted of 802 contigs with a contig N50 length of 2.6 Mb. We annotated 22,122 protein-coding genes in the Chinese sillago genomes using a de novo method as well as RNA sequencing data and homologies to other teleosts. According to the phylogenetic analysis using protein-coding genes, the Chinese sillago is closely related to Larimichthys crocea and Dicentrarchus labrax and diverged from their ancestor around 69.5-82.6 million years ago.Using long reads generated with PacBio sequencing technology, we have built a draft genome assembly for the Chinese sillago, which is the first reference genome for Sillaginidae species. This genome assembly sets a stage for comparative analysis of the diversification and adaptation of fishes in Sillaginidae.


September 22, 2019  |  

A statistical method for observing personal diploid methylomes and transcriptomes with Single-Molecule Real-Time sequencing.

We address the problem of observing personal diploid methylomes, CpG methylome pairs of homologous chromosomes that are distinguishable with respect to phased heterozygous variants (PHVs), which is challenging due to scarcity of PHVs in personal genomes. Single molecule real-time (SMRT) sequencing is promising as it outputs long reads with CpG methylation information, but a serious concern is whether reliable PHVs are available in erroneous SMRT reads with an error rate of ~15%. To overcome the issue, we propose a statistical model that reduces the error rate of phasing CpG site to 1%, thereby calling CpG hypomethylation in each haplotype with >90% precision and sensitivity. Using our statistical model, we examined GNAS complex locus known for a combination of maternally, paternally, or biallelically expressed isoforms, and observed allele-specific methylation pattern almost perfectly reflecting their respective allele-specific expression status, demonstrating the merit of elucidating comprehensive personal diploid methylomes and transcriptomes.


September 22, 2019  |  

Parliament2: Fast structural variant calling using optimized combinations of callers

Here we present Parliament2: a structural variant caller which combines multiple best-in-class structural variant callers to create a highly accurate callset. This captures more events than the individual callers achieve independently. Parliament2 uses a call-overlap-genotype approach that is highly extensible to new methods and presents users the choice to run some or all of Breakdancer, Breakseq, CNVnator, Delly, Lumpy, and Manta to run. Parliament2 applies an additional parallelization framework to speed certain callers and executes these in parallel, taking advantage of the different resource requirements to complete structural variant calling much faster than running the programs individually. Parliament2 is available as a Docker container, which pre-installs all required dependencies. This allows users to run any caller with easy installation and execution. This Docker container can easily be deployed in cloud or local environments and is available as an app on DNAnexus.


September 22, 2019  |  

The opium poppy genome and morphinan production.

Morphinan-based painkillers are derived from opium poppy (Papaver somniferum L.). We report a draft of the opium poppy genome, with 2.72 gigabases assembled into 11 chromosomes with contig N50 and scaffold N50 of 1.77 and 204 megabases, respectively. Synteny analysis suggests a whole-genome duplication at ~7.8 million years ago and ancient segmental or whole-genome duplication(s) that occurred before the Papaveraceae-Ranunculaceae divergence 110 million years ago. Syntenic blocks representative of phthalideisoquinoline and morphinan components of a benzylisoquinoline alkaloid cluster of 15 genes provide insight into how this cluster evolved. Paralog analysis identified P450 and oxidoreductase genes that combined to form the STORR gene fusion essential for morphinan biosynthesis in opium poppy. Thus, gene duplication, rearrangement, and fusion events have led to evolution of specialized metabolic products in opium poppy. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.


September 22, 2019  |  

Genome sequence of the cauliflower mushroom Sparassis crispa (Hanabiratake) and its association with beneficial usage.

Sparassis crispa (Hanabiratake) is a widely used medicinal mushroom in traditional Chinese medicine because it contains materials with pharmacological activity. Here, we report its 39.0-Mb genome, encoding 13,157 predicted genes, obtained using next-generation sequencing along with RNA-seq mapping data. A phylogenetic analysis by comparison with 25 other fungal genomes revealed that S. crispa diverged from Postia placenta, a brown-rot fungus, 94 million years ago. Several features specific to the genome were found, including the A-mating type locus with the predicted genes for HD1 and HD2 heterodomain transcription factors, the mitochondrial intermediate peptidase (MIP), and the B-mating type locus with seven potential pheromone receptor genes and three potential pheromone precursor genes. To evaluate the benefits of the extract and chemicals from S. crispa, we adopted two approaches: (1) characterization of carbohydrate-active enzyme (CAZyme) genes and ß-glucan synthase genes and the clusters of genes for the synthesis of second metabolites, such as terpenes, indoles and polyketides, and (2) identification of estrogenic activity in its mycelial extract. Two potential ß-glucan synthase genes, ScrFKS1 and ScrFKS2, corresponding to types I and II, respectively, characteristic of Agaricomycetes mushrooms, were newly identified by the search for regions homologous to the reported features of ß-glucan synthase genes; both contained the characteristic transmembrane regions and the regions homologous to the catalytic domain of the yeast ß-glucan synthase gene FKS1. Rapid estrogenic cell-signaling and DNA microarray-based transcriptome analyses revealed the presence of a new category of chemicals with estrogenic activity, silent estrogens, in the extract. The elucidation of the S. crispa genome and its genes will expand the potential of this organism for medicinal and pharmacological purposes.


September 22, 2019  |  

Assembling the genome of the African wild rice Oryza longistaminata by exploiting synteny in closely related Oryza species.

The African wild rice species Oryza longistaminata has several beneficial traits compared to cultivated rice species, such as resistance to biotic stresses, clonal propagation via rhizomes, and increased biomass production. To facilitate breeding efforts and functional genomics studies, we de-novo assembled a high-quality, haploid-phased genome. Here, we present our assembly, with a total length of 351?Mb, of which 92.2% was anchored onto 12 chromosomes. We detected 34,389 genes and 38.1% of the genome consisted of repetitive content. We validated our assembly by a comparative linkage analysis and by examining well-characterized gene families. This genome assembly will be a useful resource to exploit beneficial alleles found in O. longistaminata. Our results also show that it is possible to generate a high-quality, functionally complete rice genome assembly from moderate SMRT read coverage by exploiting synteny in a closely related Oryza species.


September 22, 2019  |  

Targeted genotyping of variable number tandem repeats with adVNTR.

Whole-genome sequencing is increasingly used to identify Mendelian variants in clinical pipelines. These pipelines focus on single-nucleotide variants (SNVs) and also structural variants, while ignoring more complex repeat sequence variants. Here, we consider the problem of genotyping Variable Number Tandem Repeats (VNTRs), composed of inexact tandem duplications of short (6-100 bp) repeating units. VNTRs span 3% of the human genome, are frequently present in coding regions, and have been implicated in multiple Mendelian disorders. Although existing tools recognize VNTR carrying sequence, genotyping VNTRs (determining repeat unit count and sequence variation) from whole-genome sequencing reads remains challenging. We describe a method, adVNTR, that uses hidden Markov models to model each VNTR, count repeat units, and detect sequence variation. adVNTR models can be developed for short-read (Illumina) and single-molecule (Pacific Biosciences [PacBio]) whole-genome and whole-exome sequencing, and show good results on multiple simulated and real data sets.© 2018 Bakhtiari et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019  |  

Combining probabilistic alignments with read pair information improves accuracy of split-alignments.

Split-alignments provide base-pair-resolution evidence of genomic rearrangements. In practice, they are found by first computing high-scoring local alignments, parts of which are then combined into a split-alignment. This approach is challenging when aligning a short read to a large and repetitive reference, as it tends to produce many spurious local alignments leading to ambiguities in identifying the correct split-alignment. This problem is further exacerbated by the fact that rearrangements tend to occur in repeat-rich regions.We propose a split-alignment technique that combats the issue of ambiguous alignments by combining information from probabilistic alignment with positional information from paired-end reads. We demonstrate that our method finds accurate split-alignments, and that this translates into improved performance of variant-calling tools that rely on split-alignments.An open-source implementation is freely available at: https://bitbucket.org/splitpairedend/last-split-pe.Supplementary data are available at Bioinformatics online.


September 22, 2019  |  

How complete are “complete” genome assemblies?-An avian perspective.

The genomics revolution has led to the sequencing of a large variety of nonmodel organisms often referred to as “whole” or “complete” genome assemblies. But how complete are these, really? Here, we use birds as an example for nonmodel vertebrates and find that, although suitable in principle for genomic studies, the current standard of short-read assemblies misses a significant proportion of the expected genome size (7% to 42%; mean 20 ± 9%). In particular, regions with strongly deviating nucleotide composition (e.g., guanine-cytosine-[GC]-rich) and regions highly enriched in repetitive DNA (e.g., transposable elements and satellite DNA) are usually underrepresented in assemblies. However, long-read sequencing technologies successfully characterize many of these underrepresented GC-rich or repeat-rich regions in several bird genomes. For instance, only ~2% of the expected total base pairs are missing in the last chicken reference (galGal5). These assemblies still contain thousands of gaps (i.e., fragmented sequences) because some chromosomal structures (e.g., centromeres) likely contain arrays of repetitive DNA that are too long to bridge with currently available technologies. We discuss how to minimize the number of assembly gaps by combining the latest available technologies with complementary strengths. At last, we emphasize the importance of knowing the location, size and potential content of assembly gaps when making population genetic inferences about adjacent genomic regions.© 2018 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.


September 22, 2019  |  

FRI-4 carbapenemase-producing Enterobacter cloacae complex isolated in Tokyo, Japan.

A carbapenem-resistant Enterobacter cloacae complex isolated in Tokyo, Japan, produced a carbapenemase that was detected by a Carba NP test and a modified carbapenem inactivation method, but none of the ‘Big Five’ carbapenemase genes was detected by PCR. This study aimed to identify the carbapenemase.Carbapenemase genes were screened by WGS. Next, we generated a recombinant plasmid in which the carbapenemase gene was inserted. We also extracted the carbapenemase gene-carrying plasmid from the E. cloacae complex. The effects of both plasmids on the antibiotic susceptibility of Escherichia coli were then tested. The carbapenemase gene-carrying plasmid in the E. cloacae complex was completely sequenced.A novel carbapenemase gene, blaFRI-4, encoded an amino acid sequence that was 93.2% identical to French imipenemase (FRI-1). E. coli transformed with blaFRI-4 showed reduced carbapenem susceptibility. A complete sequence of the blaFRI-4-carrying 98?508?bp IncFII/IncR plasmid (pTMTA61661) showed that blaFRI-4 and the surrounding region (18.7?kb) were duplicated.The FRI-4-producing E. cloacae complex was isolated in Japan, whereas all other FRI variants have been found in Europe, suggesting that the spread of FRI carbapenemases is global.


September 22, 2019  |  

Genomic analysis of Picochlorum species reveals how microalgae may adapt to variable environments.

Understanding how microalgae adapt to rapidly changing environments is not only important to science but can help clarify the potential impact of climate change on the biology of primary producers. We sequenced and analyzed the nuclear genome of multiple Picochlorum isolates (Chlorophyta) to elucidate strategies of environmental adaptation. It was previously found that coordinated gene regulation is involved in adaptation to salinity stress, and here we show that gene gain and loss also play key roles in adaptation. We determined the extent of horizontal gene transfer (HGT) from prokaryotes and their role in the origin of novel functions in the Picochlorum clade. HGT is an ongoing and dynamic process in this algal clade with adaptation being driven by transfer, divergence, and loss. One HGT candidate that is differentially expressed under salinity stress is indolepyruvate decarboxylase that is involved in the production of a plant auxin that mediates bacteria-diatom symbiotic interactions. Large differences in levels of heterozygosity were found in diploid haplotypes among Picochlorum isolates. Biallelic divergence was pronounced in P. oklahomensis (salt plains environment) when compared with its closely related sister taxon Picochlorum SENEW3 (brackish water environment), suggesting a role of diverged alleles in response to environmental stress. Our results elucidate how microbial eukaryotes with limited gene inventories expand habitat range from mesophilic to halophilic through allelic diversity, and with minor but important contributions made by HGT. We also explore how the nature and quality of genome data may impact inference of nuclear ploidy.


September 22, 2019  |  

Genomic discovery of the hypsin gene and biosynthetic pathways for terpenoids in Hypsizygus marmoreus.

Hypsizygus marmoreus (Beech mushroom) is a popular ingredient in Asian cuisine. The medicinal effects of its bioactive compounds such as hypsin and hypsiziprenol have been reported, but the genetic basis or biosynthesis of these components is unknown.In this study, we sequenced a reference strain of H. marmoreus (Haemi 51,987-8). We evaluated various assembly strategies, and as a result the Allpaths and PBJelly produced the best assembly. The resulting genome was 42.7 Mbp in length and annotated with 16,627 gene models. A putative gene (Hypma_04324) encoding the antifungal and antiproliferative hypsin protein with 75% sequence identity with the previously known N-terminal sequence was identified. Carbohydrate active enzyme analysis displayed the typical feature of white-rot fungi where auxiliary activity and carbohydrate-binding modules were enriched. The genome annotation revealed four terpene synthase genes responsible for terpenoid biosynthesis. From the gene tree analysis, we identified that terpene synthase genes can be classified into six clades. Four terpene synthase genes of H. marmoreus belonged to four different groups that implies they may be involved in the synthesis of different structures of terpenes. A terpene synthase gene cluster was well-conserved in Agaricomycetes genomes, which contained known biosynthesis and regulatory genes.Genome sequence analysis of this mushroom led to the discovery of the hypsin gene. Comparative genome analysis revealed the conserved gene cluster for terpenoid biosynthesis in the genome. These discoveries will further our understanding of the biosynthesis of medicinal bioactive molecules in this edible mushroom.


September 22, 2019  |  

The chromosome-level quality genome provides insights into the evolution of the biosynthesis genes for aroma compounds of Osmanthus fragrans.

Sweet osmanthus (Osmanthus fragrans) is a very popular ornamental tree species throughout Southeast Asia and USA particularly for its extremely fragrant aroma. We constructed a chromosome-level reference genome of O. fragrans to assist in studies of the evolution, genetic diversity, and molecular mechanism of aroma development. A total of over 118?Gb of polished reads was produced from HiSeq (45.1?Gb) and PacBio Sequel (73.35?Gb), giving 100× depth coverage for long reads. The combination of Illumina-short reads, PacBio-long reads, and Hi-C data produced the final chromosome quality genome of O. fragrans with a genome size of 727?Mb and a heterozygosity of 1.45 %. The genome was annotated using de novo and homology comparison and further refined with transcriptome data. The genome of O. fragrans was predicted to have?45,542 genes, of which 95.68 % were functionally annotated. Genome annotation found 49.35 % as the repetitive sequences, with long terminal repeats (LTR) being the richest (28.94 %). Genome evolution analysis indicated the evidence of whole-genome duplication 15 million years ago, which contributed to the current content of 45,242 genes. Metabolic analysis revealed that linalool, a monoterpene is the main aroma compound. Based on the genome and transcriptome, we further demonstrated the direct connection between terpene synthases (TPSs) and the rich aromatic molecules in O. fragrans. We identified three new flower-specific TPS genes, of which the expression coincided with the production of linalool. Our results suggest that the high number of TPS genes and the flower tissue- and stage-specific TPS genes expressions might drive the strong unique aroma production of O. fragrans.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.