April 21, 2020  |  

Genomic Plasticity Mediated by Transposable Elements in the Plant Pathogenic Fungus Colletotrichum higginsianum.

Phytopathogen genomes are under constant pressure to change, as pathogens are locked in an evolutionary arms race with their hosts, where pathogens evolve effector genes to manipulate their hosts, whereas the hosts evolve immune components to recognize the products of these genes. Colletotrichum higginsianum (Ch), a fungal pathogen with no known sexual morph, infects Brassicaceae plants including Arabidopsis thaliana. Previous studies revealed that Ch differs in its virulence toward various Arabidopsis thaliana ecotypes, indicating the existence of coevolutionary selective pressures. However, between-strain genomic variations in Ch have not been studied. Here, we sequenced and assembled the genome of a Ch strain, resulting in a highly contiguous genome assembly, which was compared with the chromosome-level genome assembly of another strain to identify genomic variations between strains. We found that the two closely related strains vary in terms of large-scale rearrangements, the existence of strain-specific regions, and effector candidate gene sets and that these variations are frequently associated with transposable elements (TEs). Ch has a compartmentalized genome consisting of gene-sparse, TE-dense regions with more effector candidate genes and gene-dense, TE-sparse regions harboring conserved genes. Additionally, analysis of the conservation patterns and syntenic regions of effector candidate genes indicated that the two strains vary in their effector candidate gene sets because of de novo evolution, horizontal gene transfer, or gene loss after divergence. Our results reveal mechanisms for generating genomic diversity in this asexual pathogen, which are important for understanding its adaption to hosts. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


April 21, 2020  |  

A Species-Wide Inventory of NLR Genes and Alleles in Arabidopsis thaliana.

Infectious disease is both a major force of selection in nature and a prime cause of yield loss in agriculture. In plants, disease resistance is often conferred by nucleotide-binding leucine-rich repeat (NLR) proteins, intracellular immune receptors that recognize pathogen proteins and their effects on the host. Consistent with extensive balancing and positive selection, NLRs are encoded by one of the most variable gene families in plants, but the true extent of intraspecific NLR diversity has been unclear. Here, we define a nearly complete species-wide pan-NLRome in Arabidopsis thaliana based on sequence enrichment and long-read sequencing. The pan-NLRome largely saturates with approximately 40 well-chosen wild strains, with half of the pan-NLRome being present in most accessions. We chart NLR architectural diversity, identify new architectures, and quantify selective forces that act on specific NLRs and NLR domains. Our study provides a blueprint for defining pan-NLRomes.Copyright © 2019 The Author(s). Published by Elsevier Inc. All rights reserved.


April 21, 2020  |  

Chromosome-level genome assembly of Triplophysa tibetana, a fish adapted to the harsh high-altitude environment of the Tibetan Plateau.

Triplophysa is an endemic fish genus of the Tibetan Plateau in China. Triplophysa tibetana, which lives at a recorded altitude of ~4,000 m and plays an important role in the highland aquatic ecosystem, serves as an excellent model for investigating high-altitude environmental adaptation. However, evolutionary and conservation studies of T. tibetana have been limited by scarce genomic resources for the genus Triplophysa. In the present study, we applied PacBio sequencing and the Hi-C technique to assemble the T. tibetana genome. A 652-Mb genome with 1,325 contigs with an N50 length of 3.1 Mb was obtained. The 1,137 contigs were further assembled into 25 chromosomes, representing 98.7% and 80.47% of all contigs at the base and sequence number level, respectively. Approximately 260 Mb of sequence, accounting for ~39.8% of the genome, was identified as repetitive elements. DNA transposons (16.3%), long interspersed nuclear elements (12.4%) and long terminal repeats (11.0%) were the most repetitive types. In total, 24,372 protein-coding genes were predicted in the genome, and ~95% of the genes were functionally annotated via a search in public databases. Using whole genome sequence information, we found that T. tibetana diverged from its common ancestor with Danio rerio ~121.4 million years ago. The high-quality genome assembled in this work not only provides a valuable genomic resource for future population and conservation studies of T. tibetana, but it also lays a solid foundation for further investigation into the mechanisms of environmental adaptation of endemic fishes in the Tibetan Plateau. © 2019 John Wiley & Sons Ltd.


April 21, 2020  |  

A global survey of full-length transcriptome of Ginkgo biloba reveals transcript variants involved in flavonoid biosynthesis

Ginkgo biloba, which contains flavonoids as bioactive components, is widely used in traditional Chinese medicine. Increasing the flavonoid production of medicinal plants through genetic engineering generally focuses on the key genes involved in flavonoid biosynthesis. However, the molecular mechanisms underlying such biosynthesis are not yet well understood. To understand these mechanisms, a combination of second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing was applied to G. biloba. Eight tissues were sampled for SMRT sequencing to generate a high-quality, full-length transcriptome database. From 23.36 Gb clean reads, 12,954 alternative polyadenylation events, 12,290 alternative splicing events, 929 fusion transcripts, 2,286 novel transcripts, and 1,270 lncRNAs were predicted by removing redundant reads. Further studies reveal that 7 AS, 5 lncRNA, and 6 fusion gene events were identified in flavonoid biosynthesis. A total of 12 gene modules were revealed to be involved in flavonoid metabolism structural genes and transcription factors by constructing co-expression networks. Weighted gene coexpression network analysis (WGCNA) analysis reveals that some hub genes operate during the biosynthesis by identifying transcription factors (TFs) and structure genes. Seven key hub genes were also identified by analyzing the correlation between gene expression level and flavonoids content. The results highlight the importance of SMRT sequencing of the full-length transcriptome in improving genome annotation and elucidating the gene regulation of flavonoid biosynthesis in G. biloba by providing a comprehensive set of reference transcripts.


April 21, 2020  |  

Improvement of the Pacific bluefin tuna (Thunnus orientalis) reference genome and development of male-specific DNA markers.

The Pacific bluefin tuna, Thunnus orientalis, is a highly migratory species that is widely distributed in the North Pacific Ocean. Like other marine species, T. orientalis has no external sexual dimorphism; thus, identifying sex-specific variants from whole genome sequence data is a useful approach to develop an effective sex identification method. Here, we report an improved draft genome of T. orientalis and male-specific DNA markers. Combining PacBio long reads and Illumina short reads sufficiently improved genome assembly, with a 38-fold increase in scaffold contiguity (to 444 scaffolds) compared to the first published draft genome. Through analysing re-sequence data of 15 males and 16 females, 250 male-specific SNPs were identified from more than 30 million polymorphisms. All male-specific variants were male-heterozygous, suggesting that T. orientalis has a male heterogametic sex-determination system. The largest linkage disequilibrium block (3,174?bp on scaffold_064) contained 51 male-specific variants. PCR primers and a PCR-based sex identification assay were developed using these male-specific variants. The sex of 115 individuals (56 males and 59 females; sex was diagnosed by visual examination of the gonads) was identified with high accuracy using the assay. This easy, accurate, and practical technique facilitates the control of sex ratios in tuna farms. Furthermore, this method could be used to estimate the sex ratio and/or the sex-specific growth rate of natural populations.


April 21, 2020  |  

Genome analysis of the rice coral Montipora capitata.

Corals comprise a biomineralizing cnidarian, dinoflagellate algal symbionts, and associated microbiome of prokaryotes and viruses. Ongoing efforts to conserve coral reefs by identifying the major stress response pathways and thereby laying the foundation to select resistant genotypes rely on a robust genomic foundation. Here we generated and analyzed a high quality long-read based ~886 Mbp nuclear genome assembly and transcriptome data from the dominant rice coral, Montipora capitata from Hawai’i. Our work provides insights into the architecture of coral genomes and shows how they differ in size and gene inventory, putatively due to population size variation. We describe a recent example of foreign gene acquisition via a bacterial gene transfer agent and illustrate the major pathways of stress response that can be used to predict regulatory components of the transcriptional networks in M. capitata. These genomic resources provide insights into the adaptive potential of these sessile, long-lived species in both natural and human influenced environments and facilitate functional and population genomic studies aimed at Hawaiian reef restoration and conservation.


April 21, 2020  |  

Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies.

More and more eukaryotic genomes are sequenced and assembled, most of them presented as a complete model in which missing chromosomal regions are filled by Ns and where a few chromosomes may be lacking. Avian genomes often contain sequences with high GC content, which has been hypothesized to be at the origin of many missing sequences in these genomes. We investigated features of these missing sequences to discover why some may not have been integrated into genomic libraries and/or sequenced.The sequences of five red jungle fowl cDNA models with high GC content were used as queries to search publicly available datasets of Illumina and Pacbio sequencing reads. These were used to reconstruct the leptin, TNFa, MRPL52, PCP2 and PET100 genes, all of which are absent from the red jungle fowl genome model. These gene sequences displayed elevated GC contents, had intron sizes that were sometimes larger than non-avian orthologues, and had non-coding regions that contained numerous tandem and inverted repeat sequences with motifs able to assemble into stable G-quadruplexes and intrastrand dyadic structures. Our results suggest that Illumina technology was unable to sequence the non-coding regions of these genes. On the other hand, PacBio technology was able to sequence these regions, but with dramatically lower efficiency than would typically be expected.High GC content was not the principal reason why numerous GC-rich regions of avian genomes are missing from genome assembly models. Instead, it is the presence of tandem repeats containing motifs capable of assembling into very stable secondary structures that is likely responsible.


April 21, 2020  |  

Reconstruction of the full-length transcriptome atlas using PacBio Iso-Seq provides insight into the alternative splicing in Gossypium australe.

Gossypium australe F. Mueller (2n?=?2x?=?26, G2 genome) possesses valuable characteristics. For example, the delayed gland morphogenesis trait causes cottonseed protein and oil to be edible while retaining resistance to biotic stress. However, the lack of gene sequences and their alternative splicing (AS) in G. australe remain unclear, hindering to explore species-specific biological morphogenesis.Here, we report the first sequencing of the full-length transcriptome of the Australian wild cotton species, G. australe, using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq) from the pooled cDNA of ten tissues to identify transcript loci and splice isoforms. We reconstructed the G. australe full-length transcriptome and identified 25,246 genes, 86 pre-miRNAs and 1468 lncRNAs. Most genes (12,832, 50.83%) exhibited two or more isoforms, suggesting a high degree of transcriptome complexity in G. australe. A total of 31,448 AS events in five major types were found among the 9944 gene loci. Among these five major types, intron retention was the most frequent, accounting for 68.85% of AS events. 29,718 polyadenylation sites were detected from 14,536 genes, 7900 of which have alternative polyadenylation sites (APA). In addition, based on our AS events annotations, RNA-Seq short reads from germinating seeds showed that differential expression of these events occurred during seed germination. Ten AS events that were randomly selected were further confirmed by RT-PCR amplification in leaf and germinating seeds.The reconstructed gene sequences and their AS in G. australe would provide information for exploring beneficial characteristics in G. australe.


April 21, 2020  |  

Chromosome-level assembly of the water buffalo genome surpasses human and goat genomes in sequence contiguity.

Rapid innovation in sequencing technologies and improvement in assembly algorithms have enabled the creation of highly contiguous mammalian genomes. Here we report a chromosome-level assembly of the water buffalo (Bubalus bubalis) genome using single-molecule sequencing and chromatin conformation capture data. PacBio Sequel reads, with a mean length of 11.5?kb, helped to resolve repetitive elements and generate sequence contiguity. All five B. bubalis sub-metacentric chromosomes were correctly scaffolded with centromeres spanned. Although the index animal was partly inbred, 58% of the genome was haplotype-phased by FALCON-Unzip. This new reference genome improves the contig N50 of the previous short-read based buffalo assembly more than a thousand-fold and contains only 383 gaps. It surpasses the human and goat references in sequence contiguity and facilitates the annotation of hard to assemble gene clusters such as the major histocompatibility complex (MHC).


April 21, 2020  |  

Characterization of a male specific region containing a candidate sex determining gene in Atlantic cod.

The genetic mechanisms determining sex in teleost fishes are highly variable and the master sex determining gene has only been identified in few species. Here we characterize a male-specific region of 9?kb on linkage group 11 in Atlantic cod (Gadus morhua) harboring a single gene named zkY for zinc knuckle on the Y chromosome. Diagnostic PCR test of phenotypically sexed males and females confirm the sex-specific nature of the Y-sequence. We identified twelve highly similar autosomal gene copies of zkY, of which eight code for proteins containing the zinc knuckle motif. 3D modeling suggests that the amino acid changes observed in six copies might influence the putative RNA-binding specificity. Cod zkY and the autosomal proteins zk1 and zk2 possess an identical zinc knuckle structure, but only the Y-specific gene zkY was expressed at high levels in the developing larvae before the onset of sex differentiation. Collectively these data suggest zkY as a candidate master masculinization gene in Atlantic cod. PCR amplification of Y-sequences in Arctic cod (Arctogadus glacialis) and Greenland cod (Gadus macrocephalus ogac) suggests that the male-specific region emerged in codfishes more than 7.5 million years ago.


April 21, 2020  |  

Origin and recent expansion of an endogenous gammaretroviral lineage in domestic and wild canids.

Vertebrate genomes contain a record of retroviruses that invaded the germlines of ancestral hosts and are passed to offspring as endogenous retroviruses (ERVs). ERVs can impact host function since they contain the necessary sequences for expression within the host. Dogs are an important system for the study of disease and evolution, yet no substantiated reports of infectious retroviruses in dogs exist. Here, we utilized Illumina whole genome sequence data to assess the origin and evolution of a recently active gammaretroviral lineage in domestic and wild canids.We identified numerous recently integrated loci of a canid-specific ERV-Fc sublineage within Canis, including 58 insertions that were absent from the reference assembly. Insertions were found throughout the dog genome including within and near gene models. By comparison of orthologous occupied sites, we characterized element prevalence across 332 genomes including all nine extant canid species, revealing evolutionary patterns of ERV-Fc segregation among species as well as subpopulations.Sequence analysis revealed common disruptive mutations, suggesting a predominant form of ERV-Fc spread by trans complementation of defective proviruses. ERV-Fc activity included multiple circulating variants that infected canid ancestors from the last 20 million to within 1.6 million years, with recent bursts of germline invasion in the sublineage leading to wolves and dogs.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.