Menu
September 22, 2019  |  

Assessing the gene content of the megagenome: sugar pine (Pinus lambertiana).

Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq has been included for the first time in conifers to combat the challenges associated with de novo transcriptome assembly. A technology comparison is provided here contribute to the otherwise scarce comparisons of 2nd and 3rd generation transcriptome sequencing approaches in plant species. In addition, the transcriptome reference was essential for gene model identification and quality assessment in the parallel project responsible for sequencing and assembly of the entire genome. In this study, the transcriptomic data was also used to address some of the questions surrounding lineage-specific Dicer-like proteins in conifers. These proteins play a role in the control of transposable element proliferation and the related genome expansion in conifers. Copyright © 2016 Author et al.


September 22, 2019  |  

The state of play in higher eukaryote gene annotation.

A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe – or ‘annotate’ – genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists – from clinicians to evolutionary biologists – need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.


September 22, 2019  |  

Universal alternative splicing of noncoding exons.

The human transcriptome is so large, diverse, and dynamic that, even after a decade of investigation by RNA sequencing (RNA-seq), we have yet to resolve its true dimensions. RNA-seq suffers from an expression-dependent bias that impedes characterization of low-abundance transcripts. We performed targeted single-molecule and short-read RNA-seq to survey the transcriptional landscape of a single human chromosome (Hsa21) at unprecedented resolution. Our analysis reaches the lower limits of the transcriptome, identifying a fundamental distinction between protein-coding and noncoding gene content: almost every noncoding exon undergoes alternative splicing, producing a seemingly limitless variety of isoforms. Analysis of syntenic regions of the mouse genome shows that few noncoding exons are shared between human and mouse, yet human splicing profiles are recapitulated on Hsa21 in mouse cells, indicative of regulation by a deeply conserved splicing code. We propose that noncoding exons are functionally modular, with alternative splicing generating an enormous repertoire of potential regulatory RNAs and a rich transcriptional reservoir for gene evolution. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.


September 22, 2019  |  

Transcriptome profiling of two ornamental and medicinal papaver herbs.

The Papaver spp. (Papaver rhoeas (Corn poppy) and Papaver nudicaule (Iceland poppy)) genera are ornamental and medicinal plants that are used for the isolation of alkaloid drugs. In this study, we generated 700 Mb of transcriptome sequences with the PacBio platform. They were assembled into 120,926 contigs, and 1185 (82.2%) of the benchmarking universal single-copy orthologs (BUSCO) core genes were completely present in our assembled transcriptome. Furthermore, using 128 Gb of Illumina sequences, the transcript expression was assessed at three stages of Papaver plant development (30, 60, and 90 days), from which we identified 137 differentially expressed transcripts. Furthermore, three co-occurrence heat maps are generated from 51 different plant genomes along with the Papaver transcriptome, i.e., secondary metabolite biosynthesis, isoquinoline alkaloid biosynthesis (BIA) pathway, and cytochrome. Sixty-nine transcripts in the BIA pathway along with 22 different alkaloids (quantified with LC-QTOF-MS/MS) were mapped into the BIA KEGG map (map00950). Finally, we identified 39 full-length cytochrome transcripts and compared them with other genomes. Collectively, this transcriptome data, along with the expression and quantitative metabolite profiles, provides an initial recording of secondary metabolites and their expression related to Papaver plant development. Moreover, these profiles could help to further detail the functional characterization of the various secondary metabolite biosynthesis and Papaver plant development associated problems.


September 22, 2019  |  

Single-molecule real-time transcript sequencing facilitates common wheat genome annotation and grain transcriptome research.

The large and complex hexaploid genome has greatly hindered genomics studies of common wheat (Triticum aestivum, AABBDD). Here, we investigated transcripts in common wheat developing caryopses using the emerging single-molecule real-time (SMRT) sequencing technology PacBio RSII, and assessed the resultant data for improving common wheat genome annotation and grain transcriptome research.We obtained 197,709 full-length non-chimeric (FLNC) reads, 74.6 % of which were estimated to carry complete open reading frame. A total of 91,881 high-quality FLNC reads were identified and mapped to 16,188 chromosomal loci, corresponding to 13,162 known genes and 3026 new genes not annotated previously. Although some FLNC reads could not be unambiguously mapped to the current draft genome sequence, many of them are likely useful for studying highly similar homoeologous or paralogous loci or for improving chromosomal contig assembly in further research. The 91,881 high-quality FLNC reads represented 22,768 unique transcripts, 9591 of which were newly discovered. We found 180 transcripts each spanning two or three previously annotated adjacent loci, suggesting that they should be merged to form correct gene models. Finally, our data facilitated the identification of 6030 genes differentially regulated during caryopsis development, and full-length transcripts for 72 transcribed gluten gene members that are important for the end-use quality control of common wheat.Our work demonstrated the value of PacBio transcript sequencing for improving common wheat genome annotation through uncovering the loci and full-length transcripts not discovered previously. The resource obtained may aid further structural genomics and grain transcriptome studies of common wheat.


September 22, 2019  |  

The genomic and functional landscapes of developmental plasticity in the American cockroach.

Many cockroach species have adapted to urban environments, and some have been serious pests of public health in the tropics and subtropics. Here, we present the 3.38-Gb genome and a consensus gene set of the American cockroach, Periplaneta americana. We report insights from both genomic and functional investigations into the underlying basis of its adaptation to urban environments and developmental plasticity. In comparison with other insects, expansions of gene families in P. americana exist for most core gene families likely associated with environmental adaptation, such as chemoreception and detoxification. Multiple pathways regulating metamorphic development are well conserved, and RNAi experiments inform on key roles of 20-hydroxyecdysone, juvenile hormone, insulin, and decapentaplegic signals in regulating plasticity. Our analyses reveal a high level of sequence identity in genes between the American cockroach and two termite species, advancing it as a valuable model to study the evolutionary relationships between cockroaches and termites.


September 22, 2019  |  

A survey of the sorghum transcriptome using single-molecule long reads.

Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ~11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.


September 22, 2019  |  

Egg case silk gene sequences from Argiope spiders: Evidence for multiple loci and a loss of function between paralogs.

Spiders swath their eggs with silk to protect developing embryos and hatchlings. Egg case silks, like other fibrous spider silks, are primarily composed of proteins called spidroins (spidroin = spider-fibroin). Silks, and thus spidroins, are important throughout the lives of spiders, yet the evolution of spidroin genes has been relatively understudied. Spidroin genes are notoriously difficult to sequence because they are typically very long (= 10 kb of coding sequence) and highly repetitive. Here, we investigate the evolution of spider silk genes through long-read sequencing of Bacterial Artificial Chromosome (BAC) clones. We demonstrate that the silver garden spiderArgiope argentatahas multiple egg case spidroin loci with a loss of function at one locus. We also use degenerate PCR primers to search the genomic DNA of congeneric species and find evidence for multiple egg case spidroin loci in otherArgiopespiders. Comparative analyses show that these multiple loci are more similar at the nucleotide level within a species than between species. This pattern is consistent with concerted evolution homogenizing gene copies within a genome. More complicated explanations include convergent evolution or recent independent gene duplications within each species. Copyright © 2018 Chaw et al.


September 22, 2019  |  

Pangenome analyses of the wheat pathogen Zymoseptoria tritici reveal the structural basis of a highly plastic eukaryotic genome.

Structural variation contributes substantially to polymorphism within species. Chromosomal rearrangements that impact genes can lead to functional variation among individuals and influence the expression of phenotypic traits. Genomes of fungal pathogens show substantial chromosomal polymorphism that can drive virulence evolution on host plants. Assessing the adaptive significance of structural variation is challenging, because most studies rely on inferences based on a single reference genome sequence.We constructed and analyzed the pangenome of Zymoseptoria tritici, a major pathogen of wheat that evolved host specialization by chromosomal rearrangements and gene deletions. We used single-molecule real-time sequencing and high-density genetic maps to assemble multiple genomes. We annotated the gene space based on transcriptomics data that covered the infection life cycle of each strain. Based on a total of five telomere-to-telomere genomes, we constructed a pangenome for the species and identified a core set of 9149 genes. However, an additional 6600 genes were exclusive to a subset of the isolates. The substantial accessory genome encoded on average fewer expressed genes but a larger fraction of the candidate effector genes that may interact with the host during infection. We expanded our analyses of the pangenome to a worldwide collection of 123 isolates of the same species. We confirmed that accessory genes were indeed more likely to show deletion polymorphisms and loss-of-function mutations compared to core genes.The pangenome construction of a highly polymorphic eukaryotic pathogen showed that a single reference genome significantly underestimates the gene space of a species. The substantial accessory genome provides a cradle for adaptive evolution.


September 22, 2019  |  

Molecular characterization of NBS-LRR genes in the soybean Rsv3 locus reveals several divergent alleles that likely confer resistance to the soybean mosaic virus.

The divergence patterns of NBS – LRR genes in soybean Rsv3 locus were deciphered and several divergent alleles ( NBS_C, NBS_D and Columbia NBS_E ) were identified as the likely functional candidates of Rsv3. The soybean Rsv3 locus, which confers resistance to the soybean mosaic virus (SMV), has been previously mapped to a region containing five nucleotide binding site-leucine-rich repeats (NBS-LRR) genes (referred to as nbs_A-E) in Williams 82. In resistant cultivars, however, the number of NBS-LRR genes in this region and their divergence from susceptible alleles remain unclear. In the present study, we constructed and screened a bacterial artificial chromosome (BAC) library for an Rsv3-possessing cultivar, Zaoshu 18. Sequencing two positive BAC inserts on the Rsv3 locus revealed that Zaoshu 18 possesses the same gene content and order as Williams 82, but two of the NBS-LRR genes, NBS_C and NBS_D, exhibit distinct features that were not observed in the Williams 82 alleles. Obtaining these NBS-LRR genes from eight additional cultivars demonstrated that the NBS_A-D genes diverged into two different alleles: the nbs_A-D alleles were associated with the rsv3-type cultivars, whereas the NBS_A-D alleles were associated with the Rsv3-possessing cultivars. For the NBS_E gene, the cultivar Columbia possesses an allele (NBS_E) that differed from that in Zaoshu 18 and rsv3-type cultivars (nbs_E). Exchanged fragments were further detected on alleles of the NBS_C-E genes, suggesting that recombination is a major force responsible for allele divergence. Also, the LRR domains of the NBS_C-E genes exhibited extremely strong signals of positive selection. Overall, the divergence patterns of the NBS-LRR genes in Rsv3 locus elucidated by this study indicate that not only NBS_C but also NBS_D and Columbia NBS_E are likely functional alleles that confer resistance to SMV.


September 22, 2019  |  

A hybrid-hierarchical genome assembly strategy to sequence the invasive golden mussel Limnoperna fortunei.

For more than 25 years, the golden mussel Limnoperna fortunei has aggressively invaded South American freshwaters, having travelled more than 5,000 km upstream across five countries. Along the way, the golden mussel has outcompeted native species and economically harmed aquaculture, hydroelectric powers, and ship transit. We have sequenced the complete genome of the golden mussel to understand the molecular basis of its invasiveness and search for ways to control it.We assembled the 1.6 Gb genome into 20548 scaffolds with an N50 length of 312 Kb using a hybrid and hierarchical assembly strategy from short and long DNA reads and transcriptomes. A total of 60717 coding genes were inferred from a customized transcriptome-trained AUGUSTUS run. We also compared predicted protein sets with those of complete molluscan genomes, revealing an exacerbation of protein-binding domains in L. fortunei. Conclusions: We built one of the best bivalve genome assemblies available using a cost-effective approach using Illumina pair-end, mate pair, and PacBio long reads. We expect that the continuous and careful annotation of L. fortunei’s genome will contribute to the investigation of bivalve genetics, evolution, and invasiveness, as well as to the development of biotechnological tools for aquatic pest control.© The Authors 2017. Published by Oxford University Press.


September 22, 2019  |  

Sequence analysis of European maize inbred line F2 provides new insights into molecular and chromosomal characteristics of presence/absence variants.

Maize is well known for its exceptional structural diversity, including copy number variants (CNVs) and presence/absence variants (PAVs), and there is growing evidence for the role of structural variation in maize adaptation. While PAVs have been described in this important crop species, they have been only scarcely characterized at the sequence level and the extent of presence/absence variation and relative chromosomal landscape of inbred-specific regions remain to be elucidated.De novo genome sequencing of the French F2 maize inbred line revealed 10,044 novel genomic regions larger than 1 kb, making up 88 Mb of DNA, that are present in F2 but not in B73 (PAV). This set of maize PAV sequences allowed us to annotate PAV content and to analyze sequence breakpoints. Using PAV genotyping on a collection of 25 temperate lines, we also analyzed Linkage Disequilibrium in PAVs and flanking regions, and PAV frequencies within maize genetic groups.We highlight the possible role of MMEJ-type double strand break repair in maize PAV formation and discover 395 new genes with transcriptional support. Pattern of linkage disequilibrium within PAVs strikingly differs from this of flanking regions and is in accordance with the intuition that PAVs may recombine less than other genomic regions. We show that most PAVs are ancient, while some are found only in European Flint material, thus pinpointing structural features that may be at the origin of adaptive traits involved in the success of this material. Characterization of such PAVs will provide useful material for further association genetic studies in European and temperate maize.


September 22, 2019  |  

Reference quality genome assemblies of three Parastagonospora nodorum isolates differing in virulence on wheat.

Parastagonospora nodorum, the causal agent of Septoria nodorum blotch in wheat, has emerged as a model necrotrophic fungal organism for the study of host-microbe interactions. To date, three necrotrophic effectors have been identified and characterized from this pathogen, including SnToxA, SnTox1, and SnTox3. Necrotrophic effector identification was greatly aided by the development of a draft genome of Australian isolate SN15 via Sanger sequencing, yet it remained largely fragmented. This research presents the development of nearly finished genomes of P. nodorum isolates Sn4, Sn2000, and Sn79-1087 using long-read sequencing technology. RNAseq analysis of isolate Sn4, consisting of eight time points covering various developmental and infection stages, mediated the annotation of 13,379 genes. Analysis of these genomes revealed large-scale polymorphism between the three isolates, including the complete absence of contig 23 from isolate Sn79-1087, and a region of genome expansion on contig 10 in isolates Sn4 and Sn2000. Additionally, these genomes exhibit the hallmark characteristics of a “two-speed” genome, being partitioned into two distinct GC-equilibrated and AT-rich compartments. Interestingly, isolate Sn79-1087 contains a lower proportion of AT-rich segments, indicating a potential lack of evolutionary hotspots. These newly sequenced genomes, consisting of telomere-to-telomere assemblies of nearly all 23 P. nodorum chromosomes, provide a robust foundation for the further examination of effector biology and genome evolution. Copyright © 2018 Richards et al.


September 22, 2019  |  

Analysis of the Aedes albopictus C6/36 genome provides insight into cell line utility for viral propagation.

The 50-year-old Aedes albopictus C6/36 cell line is a resource for the detection, amplification, and analysis of mosquito-borne viruses including Zika, dengue, and chikungunya. The cell line is derived from an unknown number of larvae from an unspecified strain of Aedes albopictus mosquitoes. Toward improved utility of the cell line for research in virus transmission, we present an annotated assembly of the C6/36 genome.The C6/36 genome assembly has the largest contig N50 (3.3 Mbp) of any mosquito assembly, presents the sequences of both haplotypes for most of the diploid genome, reveals independent null mutations in both alleles of the Dicer locus, and indicates a male-specific genome. Gene annotation was computed with publicly available mosquito transcript sequences. Gene expression data from cell line RNA sequence identified enrichment of growth-related pathways and conspicuous deficiency in aquaporins and inward rectifier K+ channels. As a test of utility, RNA sequence data from Zika-infected cells were mapped to the C6/36 genome and transcriptome assemblies. Host subtraction reduced the data set by 89%, enabling faster characterization of nonhost reads.The C6/36 genome sequence and annotation should enable additional uses of the cell line to study arbovirus vector interactions and interventions aimed at restricting the spread of human disease.


September 22, 2019  |  

Intraspecific comparative genomics of isolates of the Norway spruce pathogen (Heterobasidion parviporum) and identification of its potential virulence factors.

Heterobasidion parviporum is an economically most important fungal forest pathogen in northern Europe, causing root and butt rot disease of Norway spruce (Picea abies (L.) Karst.). The mechanisms underlying the pathogenesis and virulence of this species remain elusive. No reference genome to facilitate functional analysis is available for this species.To better understand the virulence factor at both phenotypic and genomic level, we characterized 15 H. parviporum isolates originating from different locations across Finland for virulence, vegetative growth, sporulation and saprotrophic wood decay. Wood decay capability and latitude of fungal origins exerted interactive effects on their virulence and appeared important for H. parviporum virulence. We sequenced the most virulent isolate, the first full genome sequences of H. parviporum as a reference genome, and re-sequenced the remaining 14 H. parviporum isolates. Genome-wide alignments and intrinsic polymorphism analysis showed that these isolates exhibited overall high genomic similarity with an average of at least 96% nucleotide identity when compared to the reference, yet had remarkable intra-specific level of polymorphism with a bias for CpG to TpG mutations. Reads mapping coverage analysis enabled the classification of all predicted genes into five groups and uncovered two genomic regions exclusively present in the reference with putative contribution to its higher virulence. Genes enriched for copy number variations (deletions and duplications) and nucleotide polymorphism were involved in oxidation-reduction processes and encoding domains relevant to transcription factors. Some secreted protein coding genes based on the genome-wide selection pressure, or the presence of variants were proposed as potential virulence candidates.Our study reported on the first reference genome sequence for this Norway spruce pathogen (H. parviporum). Comparative genomics analysis gave insight into the overall genomic variation among this fungal species and also facilitated the identification of several secreted protein coding genes as putative virulence factors for the further functional analysis. We also analyzed and identified phenotypic traits potentially linked to its virulence.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.