Menu
September 22, 2019

Use of a draft genome of coffee (Coffea arabica) to identify SNPs associated with caffeine content.

Arabica coffee (Coffea arabica) has a small gene pool limiting genetic improvement. Selection for caffeine content within this gene pool would be assisted by identification of the genes controlling this important trait. Sequencing of DNA bulks from 18 genotypes with extreme high- or low-caffeine content from a population of 232 genotypes was used to identify linked polymorphisms. To obtain a reference genome, a whole genome assembly of arabica coffee (variety K7) was achieved by sequencing using short read (Illumina) and long-read (PacBio) technology. Assembly was performed using a range of assembly tools resulting in 76 409 scaffolds with a scaffold N50 of 54 544 bp and a total scaffold length of 1448 Mb. Validation of the genome assembly using different tools showed high completeness of the genome. More than 99% of transcriptome sequences mapped to the C. arabica draft genome, and 89% of BUSCOs were present. The assembled genome annotated using AUGUSTUS yielded 99 829 gene models. Using the draft arabica genome as reference in mapping and variant calling allowed the detection of 1444 nonsynonymous single nucleotide polymorphisms (SNPs) associated with caffeine content. Based on Kyoto Encyclopaedia of Genes and Genomes pathway-based analysis, 65 caffeine-associated SNPs were discovered, among which 11 SNPs were associated with genes encoding enzymes involved in the conversion of substrates, which participate in the caffeine biosynthesis pathways. This analysis demonstrated the complex genetic control of this key trait in coffee.© 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.


September 22, 2019

MCF-7 breast cancer cell line PacBio generated transcriptome has ~300 novel transcribed regions, un-annotated in both RefSeq and GENCODE, and absent in the liver, heart and brain transcriptomes

Illuminating the “dark” regions of the human genome remains an ongoing effort, a decade and a half after the human genome was sequenced – RefSeq and GENCODE being two of the major annotation databases. Pacific Biosciences (PacBio) has provided open access to the transcriptome of MCF-7, a breast cancer cell line that has provided significant therapeutic advancement in breast cancer research since the 1970s. PacBio sequencing generates much longer reads compared to second-generation sequencing technologies, with a trade-off of lower throughput, higher error rate and more cost per base. Here, this transcriptome was analyzed using the YeATS pipeline, with additionally introduced kmer based algorithms, reducing computational times to a few hours on a simple workstation. Out of ~300 transcripts that have no match in both RefSeq and GENCODE, ~250 are absent in the transcriptomes of the heart, liver and brain, also provided by PacBio. Also, ~200 transcripts are absent in a recent catalogue of un-annotated long non-coding RNAs from 6,503 samples (~43 Terabases of sequence data) [1], and only two present in common in an experimental workflow RACE-Seq that reported 2,556 novel transcripts [2]. ~100 transcripts have >100 amino acid open reading frames, and have the potential of being protein coding genes. ORF based annotation also identified few bacterial transcripts in the PacBio database mapped to the human genome, and one human transcript that has been annotated as bacterial in the NCBI database. The current work reiterates the under-utilization of transcriptomes for annotating genomes. It also provides new leads for investigating breast cancer by virtue of exclusively expressed transcripts not expressed in other tissues, which have the prospects of breast cancer biomarkers based on further investigations.


September 22, 2019

First insights into the nature and evolution of antisense transcription in nematodes.

The development of multicellular organisms is coordinated by various gene regulatory mechanisms that ensure correct spatio-temporal patterns of gene expression. Recently, the role of antisense transcription in gene regulation has moved into focus of research. To characterize genome-wide patterns of antisense transcription and to study their evolutionary conservation, we sequenced a strand-specific RNA-seq library of the nematode Pristionchus pacificus.We identified 1112 antisense configurations of which the largest group represents 465 antisense transcripts (ASTs) that are fully embedded in introns of their host genes. We find that most ASTs show homology to protein-coding genes and are overrepresented in proteomic data. Together with the finding, that expression levels of ASTs and host genes are uncorrelated, this indicates that most ASTs in P. pacificus do not represent non-coding RNAs and do not exhibit regulatory functions on their host genes. We studied the evolution of antisense gene pairs across 20 nematode genomes, showing that the majority of pairs is lineage-specific and even the highly conserved vps-4, ddx-27, and sel-2 loci show abundant structural changes including duplications, deletions, intron gains and loss of antisense transcription. In contrast, host genes in general, are remarkably conserved and encode exceptionally long introns leading to unusually large blocks of conserved synteny.Our study has shown that in P. pacificus antisense transcription as such does not define non-coding RNAs but is rather a feature of highly conserved genes with long introns. We hypothesize that the presence of regulatory elements imposes evolutionary constraint on the intron length, but simultaneously, their large size makes them a likely target for translocation of genomic elements including protein-coding genes that eventually end up as ASTs.


September 22, 2019

Improving eukaryotic genome annotation using single molecule mRNA sequencing.

The advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy. Here we use these attributes to improve on the genome annotation of the parasitic hookworm Ancylostoma ceylanicum using PacBio RNA-Seq.We sequenced 192,888 circular consensus sequences (CCS) derived from cDNAs generated using the CloneTech SMARTer system. These SMARTer-SMRT libraries were normalized and size-selected providing a robust population of expressed structural genes for subsequent genome annotation. We demonstrate PacBio mRNA sequences based genome annotation improvement, compared to genome annotation using conventional sequencing-by-synthesis alone, by identifying 1609 (9.2%) new genes, extended the length of 3965 (26.7%) genes and increased the total genomic exon length by 1.9 Mb (12.4%). Non-coding sequence representation (primarily from UTRs based on dT reverse transcription priming) was particularly improved, increasing in total length by fifteen-fold, by increasing both the length and number of UTR exons. In addition, the UTR data provided by these CCS allowed for the identification of a novel SL2 splice leader sequence for A. ceylanicum and an increase in the number and proportion of functionally annotated genes. RNA-seq data also confirmed some of the newly annotated genes and gene features.Overall, PacBio data has supported a significant improvement in gene annotation in this genome, and is an appealing alternative or complementary technique for genome annotation to the other transcript sequencing technologies.


September 22, 2019

PacBio sequencing and its applications.

Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with diseases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Additionally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.


September 22, 2019

Draft genome assembly of the poultry red mite, Dermanyssus gallinae.

The poultry red mite, Dermanyssus gallinae, is a major worldwide concern in the egg-laying industry. Here, we report the first draft genome assembly and gene prediction of Dermanyssus gallinae, based on combined PacBio and MinION long-read de novo sequencing. The ~959-Mb genome is predicted to encode 14,608 protein-coding genes.


September 22, 2019

Global analysis of epigenetic regulation of gene expression in response to drought stress in Sorghum.

Abiotic stresses including drought are major limiting factors of crop yields and cause significant crop losses. Acquisition of stress tolerance to abiotic stresses requires coordinated regulation of a multitude of biochemical and physiological changes, and most of these changes depend on alterations in gene expression. The goal of this work is to perform global analysis of differential regulation of gene expression and alternative splicing, and their relationship with chromatin landscape in drought sensitive and tolerant cultivars. our Iso-Seq study revealed transcriptome-wide full-length isoforms at an unprecedented scale with over 11000 novel splice isoforms. Additionally, we uncovered alternative polyadenylation sites of ~11000 expressed genes and many novel genes. Overall, Iso-Seq results greatly enhanced sorghum gene annotations that are not only useful in analyentified differentially expressed genes and splicing events that are correlated with tzing all our RNA-seq, ChIP-seq and ATAC-seq data but also serve as a great resource to the plant biology community. Our studies idhe drought-resistant phenotype. An association between alternative splicing and chromatin accessibility was also revealed. Several computational tools developed here (TAPIS and iDiffIR) have been made freely available to the research community in analyzing alternative splicing and differential alternative splicing.


September 22, 2019

Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing.

We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.


September 22, 2019

Long reads: their purpose and place.

In recent years long-read technologies have moved from being a niche and specialist field to a point of relative maturity likely to feature frequently in the genomic landscape. Analogous to next generation sequencing, the cost of sequencing using long-read technologies has materially dropped whilst the instrument throughput continues to increase. Together these changes present the prospect of sequencing large numbers of individuals with the aim of fully characterizing genomes at high resolution. In this article, we will endeavour to present an introduction to long-read technologies showing: what long reads are; how they are distinct from short reads; why long reads are useful and how they are being used. We will highlight the recent developments in this field, and the applications and potential of these technologies in medical research, and clinical diagnostics and therapeutics.


September 22, 2019

Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing.

Zea mays is an important genetic model for elucidating transcriptional networks. Uncertainties about the complete structure of mRNA transcripts limit the progress of research in this system. Here, using single-molecule sequencing technology, we produce 111,151 transcripts from 6 tissues capturing ~70% of the genes annotated in maize RefGen_v3 genome. A large proportion of transcripts (57%) represent novel, sometimes tissue-specific, isoforms of known genes and 3% correspond to novel gene loci. In other cases, the identified transcripts have improved existing gene models. Averaging across all six tissues, 90% of the splice junctions are supported by short reads from matched tissues. In addition, we identified a large number of novel long non-coding RNAs and fusion transcripts and found that DNA methylation plays an important role in generating various isoforms. Our results show that characterization of the maize B73 transcriptome is far from complete, and that maize gene expression is more complex than previously thought.


September 22, 2019

Single-molecule long-read sequencing facilitates shrimp transcriptome research.

Although shrimp are of great economic importance, few full-length shrimp transcriptomes are available. Here, we used Pacific Biosciences single-molecule real-time (SMRT) long-read sequencing technology to generate transcripts from the Pacific white shrimp (Litopenaeus vannamei). We obtained 322,600 full-length non-chimeric reads, from which we generated 51,367 high-quality unique full-length transcripts. We corrected errors in the SMRT sequences by comparison with Illumina-produced short reads. We successfully annotated 81.72% of all unique SMRT transcripts against the NCBI non-redundant database, 58.63% against Swiss-Prot, 45.38% against Gene Ontology, 32.57% against Clusters of Orthologous Groups of proteins (COG), and 47.83% against Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Across all transcripts, we identified 3,958 long non-coding RNAs (lncRNAs) and 80,650 simple sequence repeats (SSRs). Our study provides a rich set of full-length cDNA sequences for L. vannamei, which will greatly facilitate shrimp transcriptome research.


September 22, 2019

Assessment of an organ-specific de novo transcriptome of the nematode trap-crop, Solanum sisymbriifolium

Solanum sisymbriifolium, also known as “Litchi Tomato” or “Sticky Nightshade,” is an undomesticated and poorly researched plant related to potato and tomato. Unlike the latter species, S. sisymbriifolium induces eggs of the cyst nematode, Globodera pallida, to hatch and migrate into its roots, but then arrests further nematode maturation. In order to provide researchers with a partial blueprint of its genetic make-up so that the mechanism of this response might be identified, we used single molecule real time (SMRT) sequencing to compile a high quality de novo transcriptome of 41,189 unigenes drawn from individually sequenced bud, root, stem, and leaf RNA populations. Functional annotation and BUSCO analysis showed that this transcriptome was surprisingly complete, even though it represented genes expressed at a single time point. By sequencing the 4 organ libraries separately, we found we could get a reliable snapshot of transcript distributions in each organ. A divergent site analysis of the merged transcriptome indicated that this species might have undergone a recent genome duplication and re-diploidization. Further analysis indicated that the plant then retained a disproportionate number of genes associated with photosynthesis and amino acid metabolism in comparison to genes with characteristics of R-proteins or involved in secondary metabolism. The former processes may have given S. sisymbriifolium a bigger competitive advantage than the latter did. Copyright © 2018 Wixom et al.


September 22, 2019

Rewired RNAi-mediated genome surveillance in house dust mites.

House dust mites are common pests with an unusual evolutionary history, being descendants of a parasitic ancestor. Transition to parasitism is frequently accompanied by genome rearrangements, possibly to accommodate the genetic change needed to access new ecology. Transposable element (TE) activity is a source of genomic instability that can trigger large-scale genomic alterations. Eukaryotes have multiple transposon control mechanisms, one of which is RNA interference (RNAi). Investigation of the dust mite genome failed to identify a major RNAi pathway: the Piwi-associated RNA (piRNA) pathway, which has been replaced by a novel small-interfering RNA (siRNA)-like pathway. Co-opting of piRNA function by dust mite siRNAs is extensive, including establishment of TE control master loci that produce siRNAs. Interestingly, other members of the Acari have piRNAs indicating loss of this mechanism in dust mites is a recent event. Flux of RNAi-mediated control of TEs highlights the unusual arc of dust mite evolution.


September 22, 2019

Contemporary evolution of a Lepidopteran species, Heliothis virescens, in response to modern agricultural practices.

Adaptation to human-induced environmental change has the potential to profoundly influence the genomic architecture of affected species. This is particularly true in agricultural ecosystems, where anthropogenic selection pressure is strong. Heliothis virescens primarily feeds on cotton in its larval stages, and US populations have been declining since the widespread planting of transgenic cotton, which endogenously expresses proteins derived from Bacillus thuringiensis (Bt). No physiological adaptation to Bt toxin has been found in the field, so adaptation in this altered environment could involve (i) shifts in host plant selection mechanisms to avoid cotton, (ii) changes in detoxification mechanisms required for cotton-feeding vs. feeding on other hosts or (iii) loss of resistance to previously used management practices including insecticides. Here, we begin to address whether such changes occurred in H. virescens populations between 1997 and 2012, as Bt-cotton cultivation spread through the agricultural landscape. For our study, we produced an H. virescens genome assembly and used this in concert with a ddRAD-seq-enabled genome scan to identify loci with significant allele frequency changes over the 15-year period. Genetic changes at a previously described H. virescens insecticide target of selection were detectable in our genome scan and increased our confidence in this methodology. Additional loci were also detected as being under selection, and we quantified the selection strength required to elicit observed allele frequency changes at each locus. Potential contributions of genes near loci under selection to adaptive phenotypes in the H. virescens cotton system are discussed.© 2017 John Wiley & Sons Ltd.


September 22, 2019

Avian genomics lends insights into endocrine function in birds.

The genomics era has brought along the completed sequencing of a large number of bird genomes that cover a broad range of the avian phylogenetic tree (>30 orders), leading to major novel insights into avian biology and evolution. Among recent findings, the discovery that birds lack a large number of protein coding genes that are organized in highly conserved syntenic clusters in other vertebrates is very intriguing, given the physiological importance of many of these genes. A considerable number of them play prominent endocrine roles, suggesting that birds evolved compensatory genetic or physiological mechanisms that allowed them to survive and thrive in spite of these losses. While further studies are needed to establish the exact extent of avian gene losses, these findings point to birds as potentially highly relevant model organisms for exploring the genetic basis and possible therapeutic approaches for a wide range of endocrine functions and disorders. Copyright © 2017 Elsevier Inc. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.