Menu
September 22, 2019

Targeted sequencing by gene synteny, a new strategy for polyploid species: sequencing and physical structure of a complex sugarcane region.

Sugarcane exhibits a complex genome mainly due to its aneuploid nature and high ploidy level, and sequencing of its genome poses a great challenge. Closely related species with well-assembled and annotated genomes can be used to help assemble complex genomes. Here, a stable quantitative trait locus (QTL) related to sugar accumulation in sorghum was successfully transferred to the sugarcane genome. Gene sequences related to this QTL were identified in silico from sugarcane transcriptome data, and molecular markers based on these sequences were developed to select bacterial artificial chromosome (BAC) clones from the sugarcane variety SP80-3280. Sixty-eight BAC clones containing at least two gene sequences associated with the sorghum QTL were sequenced using Pacific Biosciences (PacBio) technology. Twenty BAC sequences were found to be related to the syntenic region, of which nine were sufficient to represent this region. The strategy we propose is called “targeted sequencing by gene synteny,” which is a simpler approach to understanding the genome structure of complex genomic regions associated with traits of interest.


September 22, 2019

Comparative genomics of smut pathogens: Insights from orphans and positively selected genes into host specialization.

Host specialization is a key evolutionary process for the diversification and emergence of new pathogens. However, the molecular determinants of host range are poorly understood. Smut fungi are biotrophic pathogens that have distinct and narrow host ranges based on largely unknown genetic determinants. Hence, we aimed to expand comparative genomics analyses of smut fungi by including more species infecting different hosts and to define orphans and positively selected genes to gain further insights into the genetics basis of host specialization. We analyzed nine lineages of smut fungi isolated from eight crop and non-crop hosts: maize, barley, sugarcane, wheat, oats, Zizania latifolia (Manchurian rice), Echinochloa colona (a wild grass), and Persicaria sp. (a wild dicot plant). We assembled two new genomes: Ustilago hordei (strain Uhor01) isolated from oats and U. tritici (strain CBS 119.19) isolated from wheat. The smut genomes were of small sizes, ranging from 18.38 to 24.63 Mb. U. hordei species experienced genome expansions due to the proliferation of transposable elements and the amount of these elements varied among the two strains. Phylogenetic analysis confirmed that Ustilago is not a monophyletic genus and, furthermore, detected misclassification of the U. tritici specimen. The comparison between smut pathogens of crop and non-crop hosts did not reveal distinct signatures, suggesting that host domestication did not play a dominant role in shaping the evolution of smuts. We found that host specialization in smut fungi likely has a complex genetic basis: different functional categories were enriched in orphans and lineage-specific selected genes. The diversification and gain/loss of effector genes are probably the most important determinants of host specificity.


September 22, 2019

Comparative genomics of the wheat fungal pathogen Pyrenophora tritici-repentis reveals chromosomal variations and genome plasticity.

Pyrenophora tritici-repentis (Ptr) is a necrotrophic fungal pathogen that causes the major wheat disease, tan spot. We set out to provide essential genomics-based resources in order to better understand the pathogenicity mechanisms of this important pathogen.Here, we present eight new Ptr isolate genomes, assembled and annotated; representing races 1, 2 and 5, and a new race. We report a high quality Ptr reference genome, sequenced by PacBio technology with Illumina paired-end data support and optical mapping. An estimated 98% of the genome coverage was mapped to 10 chromosomal groups, using a two-enzyme hybrid approach. The final reference genome was 40.9 Mb and contained a total of 13,797 annotated genes, supported by transcriptomic and proteogenomics data sets.Whole genome comparative analysis revealed major chromosomal segmental rearrangements and fusions, highlighting intraspecific genome plasticity in this species. Furthermore, the Ptr race classification was not supported at the whole genome level, as phylogenetic analysis did not cluster the ToxA producing isolates. This expansion of available Ptr genomics resources will directly facilitate research aimed at controlling tan spot disease.


September 22, 2019

Massive lateral transfer of genes encoding plant cell wall-degrading enzymes to the mycoparasitic fungus Trichoderma from its plant-associated hosts.

Unlike most other fungi, molds of the genus Trichoderma (Hypocreales, Ascomycota) are aggressive parasites of other fungi and efficient decomposers of plant biomass. Although nutritional shifts are common among hypocrealean fungi, there are no examples of such broad substrate versatility as that observed in Trichoderma. A phylogenomic analysis of 23 hypocrealean fungi (including nine Trichoderma spp. and the related Escovopsis weberi) revealed that the genus Trichoderma has evolved from an ancestor with limited cellulolytic capability that fed on either fungi or arthropods. The evolutionary analysis of Trichoderma genes encoding plant cell wall-degrading carbohydrate-active enzymes and auxiliary proteins (pcwdCAZome, 122 gene families) based on a gene tree / species tree reconciliation demonstrated that the formation of the genus was accompanied by an unprecedented extent of lateral gene transfer (LGT). Nearly one-half of the genes in Trichoderma pcwdCAZome (41%) were obtained via LGT from plant-associated filamentous fungi belonging to different classes of Ascomycota, while no LGT was observed from other potential donors. In addition to the ability to feed on unrelated fungi (such as Basidiomycota), we also showed that Trichoderma is capable of endoparasitism on a broad range of Ascomycota, including extant LGT donors. This phenomenon was not observed in E. weberi and rarely in other mycoparasitic hypocrealean fungi. Thus, our study suggests that LGT is linked to the ability of Trichoderma to parasitize taxonomically related fungi (up to adelphoparasitism in strict sense). This may have allowed primarily mycotrophic Trichoderma fungi to evolve into decomposers of plant biomass.


September 22, 2019

Genome-wide comparison reveals a probiotic strain Lactococcus lactis WFLU12 isolated from the gastrointestinal tract of olive flounder (Paralichthys Olivaceus) harboring genes supporting probiotic action.

Our previous study has shown that dietary supplementation with Lactococcus lactis WFLU12 can enhance the growth of olive flounder and its resistance against streptococcal infection. The objective of the present study was to use comparative genomics tools to investigate genomic characteristics of strain WFLU12 and the presence of genes supporting its probiotic action using sequenced genomes of L. lactis strains. Dispensable and singleton genes of strain WFLU12 were found to be more enriched in genes associated with metabolism (e.g., energy production and conversion, and carbohydrate transport and metabolism) than pooled dispensable and singleton genes in other L. lactis strains, reflecting WFLU12 strain-specific ecosystem origin and its ability to metabolize different energy sources. Strain WFLU12 produced antimicrobial compounds that could inhibit several bacterial fish pathogens. It possessed the nisin gene cluster (nisZBTCIPRKFEG) and genes encoding lysozyme and colicin V. However, only three other strains (CV56, IO-1, and SO) harbor a complete nisin gene cluster. We also found that L. lactis WFLU12 possessed many other important functional genes involved in stress responses to the gastrointestinal tract environment, dietary energy extraction, and metabolism to support the probiotic action of this strain found in our previous study. This strongly indicates that not all L. lactis strains can be used as probiotics. This study highlights comparative genomics approaches as very useful and powerful tools to select probiotic candidates and predict their probiotic effects.


September 22, 2019

Genome-wide identification of simple sequence repeats and development of polymorphic SSR markers for genetic studies in tea plant (Camellia sinensis)

The tea plant (Camellia sinensis (L.) O. Kuntze) is one of the most popular non-alcoholic beverage crops worldwide. The availability of complete genome sequences for the Camellia sinensis var. ‘Shuchazao’ has provided the opportunity to identify all types of simple sequence repeat (SSR) markers by genome-wide scan. In this study, a total of 667,980 SSRs were identified in the ~?3.08 Gb genome, with an overall density of 216.88 SSRs/Mb. Dinucleotide repeats were predominant among microsatellites (72.25%), followed by trinucleotide repeats (15.35%), while the remaining SSRs accounted for less than 13%. The motif AG/CT (49.96%) and AT/TA (40.14%) were the most and the second most abundant among all identified SSR motifs, respectively; meanwhile, AAT/ATT (41.29%) and AAAT/ATTT (67.47%) were the most common among trinucleotides and tetranucleotides, respectively. A total of 300 primer pairs were designed to screen six tea cultivars for polymorphisms of SSR markers using the five selected repeat types of microsatellite sequences. The resulting 96 SSR markers that yielded polymorphic and unambiguous bands were further deployed on 47 tea cultivars for genetic diversity assessment, demonstrating high polymorphism of these SSR markers. Remarkably, the dendrogram revealed that the phylogenetic relationships among these tea cultivars are highly consistent with their genetic backgrounds or places of origin. The identified genome-wide SSRs and newly developed SSR markers will provide a powerful means for genetic researches in tea plant, including genetic diversity and evolutionary origin analysis, fingerprinting, QTL mapping, and marker-assisted selection for breeding.


September 22, 2019

Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality.

Tea, one of the world’s most important beverage crops, provides numerous secondary metabolites that account for its rich taste and health benefits. Here we present a high-quality sequence of the genome of tea, Camellia sinensis var. sinensis (CSS), using both Illumina and PacBio sequencing technologies. At least 64% of the 3.1-Gb genome assembly consists of repetitive sequences, and the rest yields 33,932 high-confidence predictions of encoded proteins. Divergence between two major lineages, CSS and Camellia sinensis var. assamica (CSA), is calculated to ~0.38 to 1.54 million years ago (Mya). Analysis of genic collinearity reveals that the tea genome is the product of two rounds of whole-genome duplications (WGDs) that occurred ~30 to 40 and ~90 to 100 Mya. We provide evidence that these WGD events, and subsequent paralogous duplications, had major impacts on the copy numbers of secondary metabolite genes, particularly genes critical to producing three key quality compounds: catechins, theanine, and caffeine. Analyses of transcriptome and phytochemistry data show that amplification and transcriptional divergence of genes encoding a large acyltransferase family and leucoanthocyanidin reductases are associated with the characteristic young leaf accumulation of monomeric galloylated catechins in tea, while functional divergence of a single member of the glutamine synthetase gene family yielded theanine synthetase. This genome sequence will facilitate understanding of tea genome evolution and tea metabolite pathways, and will promote germplasm utilization for breeding improved tea varieties. Copyright © 2018 the Author(s). Published by PNAS.


September 22, 2019

Whole genome sequence of an edible and potential medicinal fungus, Cordyceps guangdongensis.

Cordyceps guangdongensis is an edible fungus which was approved as a novel food by the Chinese Ministry of Public Health in 2013. It also has a broad prospect of application in pharmaceutical industries, with many medicinal activities. In this study, the whole genome of C. guangdongensis GD15, a single spore isolate from a wild strain, was sequenced and assembled with Illumina and PacBio sequencing technology. The generated genome is 29.05 Mb in size, comprising nine scaffolds with an average GC content of 57.01%. It is predicted to contain a total of 9150 protein-coding genes. Sequence identification and comparative analysis indicated that the assembled scaffolds contained two complete chromosomes and four single-end chromosomes, showing a high level assembly. Gene annotation revealed a diversity of transposons that could contribute to the genome size and evolution. Besides, approximately 15.57% and 12.01% genes involved in metabolic processes were annotated by KEGG and COG respectively. Genes belonging to CAZymes accounted for 3.15% of the total genes. In addition, 435 transcription factors, involved in various biological processes, were identified. Among the identified transcription factors, the fungal transcription regulatory proteins (18.39%) and fungal-specific transcription factors (19.77%) represented the two largest classes of transcription factors. This genomic resource provided a new insight into better understanding the relevance of phenotypic characters and genetic mechanisms in C. guangdongensis. Copyright © 2018 Zhang et al.


September 22, 2019

Insights into platypus population structure and history from whole-genome sequencing.

The platypus is an egg-laying mammal which, alongside the echidna, occupies a unique place in the mammalian phylogenetic tree. Despite widespread interest in its unusual biology, little is known about its population structure or recent evolutionary history. To provide new insights into the dispersal and demographic history of this iconic species, we sequenced the genomes of 57 platypuses from across the whole species range in eastern mainland Australia and Tasmania. Using a highly improved reference genome, we called over 6.7?M SNPs, providing an informative genetic data set for population analyses. Our results show very strong population structure in the platypus, with our sampling locations corresponding to discrete groupings between which there is no evidence for recent gene flow. Genome-wide data allowed us to establish that 28 of the 57 sampled individuals had at least a third-degree relative among other samples from the same river, often taken at different times. Taking advantage of a sampled family quartet, we estimated the de novo mutation rate in the platypus at 7.0?×?10-9/bp/generation (95% CI 4.1?×?10-9-1.2?×?10-8/bp/generation). We estimated effective population sizes of ancestral populations and haplotype sharing between current groupings, and found evidence for bottlenecks and long-term population decline in multiple regions, and early divergence between populations in different regions. This study demonstrates the power of whole-genome sequencing for studying natural populations of an evolutionarily important species.


September 22, 2019

Phenotypic diversification by enhanced genome restructuring after induction of multiple DNA double-strand breaks.

DNA double-strand break (DSB)-mediated genome rearrangements are assumed to provide diverse raw genetic materials enabling accelerated adaptive evolution; however, it remains unclear about the consequences of massive simultaneous DSB formation in cells and their resulting phenotypic impact. Here, we establish an artificial genome-restructuring technology by conditionally introducing multiple genomic DSBs in vivo using a temperature-dependent endonuclease TaqI. Application in yeast and Arabidopsis thaliana generates strains with phenotypes, including improved ethanol production from xylose at higher temperature and increased plant biomass, that are stably inherited to offspring after multiple passages. High-throughput genome resequencing revealed that these strains harbor diverse rearrangements, including copy number variations, translocations in retrotransposons, and direct end-joinings at TaqI-cleavage sites. Furthermore, large-scale rearrangements occur frequently in diploid yeasts (28.1%) and tetraploid plants (46.3%), whereas haploid yeasts and diploid plants undergo minimal rearrangement. This genome-restructuring system (TAQing system) will enable rapid genome breeding and aid genome-evolution studies.


September 22, 2019

NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model.

PacBio sequencing platform offers longer read lengths than the second-generation sequencing technologies. It has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. Due to its extremely wide range of application areas, fast sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of subsequent analysis tools. Although there are several available simulators (e.g., PBSIM, SimLoRD and FASTQSim) that target the specific generation of PacBio libraries, the error rate of simulated sequences is not well matched to the quality value of raw PacBio datasets, especially for PacBio’s continuous long reads (CLR).By analyzing the characteristic features of CLR data from PacBio SMRT (single molecule real time) sequencing, we developed a new PacBio sequencing simulator (called NPBSS) for producing CLR reads. NPBSS simulator firstly samples the read sequences according to the read length logarithmic normal distribution, and choses different base quality values with different proportions. Then, NPBSS computes the overall error probability of each base in the read sequence with an empirical model, and calculates the deletion, substitution and insertion probabilities with the overall error probability to generate the PacBio CLR reads. Alignment results demonstrate that NPBSS fits the error rate of the PacBio CLR reads better than PBSIM and FASTQSim. In addition, the assembly results also show that simulated sequences of NPBSS are more like real PacBio CLR data.NPBSS simulator is convenient to use with efficient computation and flexible parameters setting. Its generating PacBio CLR reads are more like real PacBio datasets.


September 22, 2019

Genome-wide analysis of the NAC transcription factor family and their expression during the development and ripening of the Fragaria × ananassa fruits.

NAC proteins are a family of transcription factors which have a variety of important regulatory roles in plants. They present a very well conserved group of NAC subdomains in the N-terminal region and a highly variable domain at the C-terminus. Currently, knowledge concerning NAC family in the strawberry plant remains very limited. In this work, we analyzed the NAC family of Fragaria vesca, and a total of 112 NAC proteins were identified after we curated the annotations from the version 4.0.a1 genome. They were placed into the ligation groups (pseudo-chromosomes) and described its physicochemical and genetic features. A microarray transcriptomic analysis showed six of them expressed during the development and ripening of the Fragaria x ananassa fruit. Their expression patterns were studied in fruit (receptacle and achenes) in different stages of development and in vegetative tissues. Also, the expression level under different hormonal treatments (auxins, ABA) and drought stress was investigated. In addition, they were clustered with other NAC transcription factor with known function related to growth and development, senescence, fruit ripening, stress response, and secondary cell wall and vascular development. Our results indicate that these six strawberry NAC proteins could play different important regulatory roles in the process of development and ripening of the fruit, providing the basis for further functional studies and the selection for NAC candidates suitable for biotechnological applications.


September 22, 2019

Inpactor, integrated and parallel analyzer and classifier of LTR retrotransposons and its application for pineapple LTR retrotransposons diversity and dynamics.

One particular class of Transposable Elements (TEs), called Long Terminal Repeats (LTRs), retrotransposons, comprises the most abundant mobile elements in plant genomes. Their copy number can vary from several hundreds to up to a few million copies per genome, deeply affecting genome organization and function. The detailed classification of LTR retrotransposons is an essential step to precisely understand their effect at the genome level, but remains challenging in large-sized genomes, requiring the use of optimized bioinformatics tools that can take advantage of supercomputers. Here, we propose a new tool: Inpactor, a parallel and scalable pipeline designed to classify LTR retrotransposons, to identify autonomous and non-autonomous elements, to perform RT-based phylogenetic trees and to analyze their insertion times using High Performance Computing (HPC) techniques. Inpactor was tested on the classification and annotation of LTR retrotransposons in pineapple, a recently-sequenced genome. The pineapple genome assembly comprises 44% of transposable elements, of which 23% were classified as LTR retrotransposons. Exceptionally, 16.4% of the pineapple genome assembly corresponded to only one lineage of the Gypsy superfamily: Del, suggesting that this particular lineage has undergone a significant increase in its copy numbers. As demonstrated for the pineapple genome, Inpactor provides comprehensive data of LTR retrotransposons’ classification and dynamics, allowing a fine understanding of their contribution to genome structure and evolution. Inpactor is available at https://github.com/simonorozcoarias/Inpactor.


September 22, 2019

A transposable element annotation pipeline and expression analysis reveal potentially active elements in the microalga Tisochrysis lutea.

Transposable elements (TEs) are mobile DNA sequences known as drivers of genome evolution. Their impacts have been widely studied in animals, plants and insects, but little is known about them in microalgae. In a previous study, we compared the genetic polymorphisms between strains of the haptophyte microalga Tisochrysis lutea and suggested the involvement of active autonomous TEs in their genome evolution.To identify potentially autonomous TEs, we designed a pipeline named PiRATE (Pipeline to Retrieve and Annotate Transposable Elements, download: https://doi.org/10.17882/51795 ), and conducted an accurate TE annotation on a new genome assembly of T. lutea. PiRATE is composed of detection, classification and annotation steps. Its detection step combines multiple, existing analysis packages representing all major approaches for TE detection and its classification step was optimized for microalgal genomes. The efficiency of the detection and classification steps was evaluated with data on the model species Arabidopsis thaliana. PiRATE detected 81% of the TE families of A. thaliana and correctly classified 75% of them. We applied PiRATE to T. lutea genomic data and established that its genome contains 15.89% Class I and 4.95% Class II TEs. In these, 3.79 and 17.05% correspond to potentially autonomous and non-autonomous TEs, respectively. Annotation data was combined with transcriptomic and proteomic data to identify potentially active autonomous TEs. We identified 17 expressed TE families and, among these, a TIR/Mariner and a TIR/hAT family were able to synthesize their transposase. Both these TE families were among the three highest expressed genes in a previous transcriptomic study and are composed of highly similar copies throughout the genome of T. lutea. This sum of evidence reveals that both these TE families could be capable of transposing or triggering the transposition of potential related MITE elements.This manuscript provides an example of a de novo transposable element annotation of a non-model organism characterized by a fragmented genome assembly and belonging to a poorly studied phylum at genomic level. Integration of multi-omics data enabled the discovery of potential mobile TEs and opens the way for new discoveries on the role of these repeated elements in genomic evolution of microalgae.


September 22, 2019

Nucleotide-binding resistance gene signatures in sugar beet, insights from a new reference genome.

Nucleotide-binding (NB-ARC), leucine-rich-repeat genes (NLRs) account for 60.8% of resistance (R) genes molecularly characterized from plants. NLRs exist as large gene families prone to tandem duplication and transposition, with high sequence diversity among crops and their wild relatives. This diversity can be a source of new disease resistance, but difficulty in distinguishing specific sequences from homologous gene family members hinders characterization of resistance for improving crop varieties. Current genome sequencing and assembly technologies, especially those using long-read sequencing, are improving resolution of repeat-rich genomic regions and clarifying locations of duplicated genes, such as NLRs. Using the conserved NB-ARC domain as a model, 231 tentative NB-ARC loci were identified in a highly contiguous genome assembly of sugar beet, revealing diverged and truncated NB-ARC signatures as well as full-length sequences. The NB-ARC-associated proteins contained NLR resistance gene domains, including TIR, CC, and LRR, as well as other integrated domains. Phylogenetic relationships of partial and complete domains were determined, and patterns of physical clustering in the genome were evaluated. Comparison of sugar beet NB-ARC domains to validated R genes from monocots and eudicots suggested extensive B. vulgaris-specific subfamily expansions. The NLR landscape in the rhizomania resistance conferring Rz region of Chromosome 3 was characterized, identifying 26 NLR-like sequences spanning 20 MB. This work presents the first detailed view of NLR family composition in a member of the Caryophyllales, builds a foundation for additional disease resistance work in B. vulgaris, and demonstrates an additional nucleic-acid-based method for NLR prediction in non-model plant species. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.