Menu
September 22, 2019

ISOdb: A comprehensive database of full-length isoforms generated by Iso-Seq.

The accurate landscape of transcript isoforms plays an important role in the understanding of gene function and gene regulation. However, building complete transcripts is very challenging for short reads generated using next-generation sequencing. Fortunately, isoform sequencing (Iso-Seq) using single-molecule sequencing technologies, such as PacBio SMRT, provides long reads spanning entire transcript isoforms which do not require assembly. Therefore, we have developed ISOdb, a comprehensive resource database for hosting and carrying out an in-depth analysis of Iso-Seq datasets and visualising the full-length transcript isoforms. The current version of ISOdb has collected 93 publicly available Iso-Seq samples from eight species and presents the samples in two levels: (1) sample level, including metainformation, long read distribution, isoform numbers, and alternative splicing (AS) events of each sample; (2) gene level, including the total isoforms, novel isoform number, novel AS number, and isoform visualisation of each gene. In addition, ISOdb provides a user interface in the website for uploading sample information to facilitate the collection and analysis of researchers’ datasets. Currently, ISOdb is the first repository that offers comprehensive resources and convenient public access for hosting, analysing, and visualising Iso-Seq data, which is freely available.


September 22, 2019

SMRT sequencing of full-length transcriptome of flea beetle Agasicles hygrophila (Selman and Vogt).

This study was aimed at generating the full-length transcriptome of flea beetle Agasicles hygrophila (Selman and Vogt) using single-molecule real-time (SMRT) sequencing. Four developmental stages of A. hygrophila, including eggs, larvae, pupae, and adults were harvested for isolating total RNA. The mixed samples were used for SMRT sequencing to generate the full-length transcriptome. Based on the obtained transcriptome data, alternative splicing event, simple sequence repeat (SSR) analysis, coding sequence prediction, transcript functional annotation, and lncRNA prediction were performed. Total 9.45?Gb of clean reads were generated, including 335,045 reads of insert (ROI) and 158,085 full-length non-chimeric (FLNC) reads. Transcript clustering analysis of FLNC reads identified 40,004 consensus isoforms, including 31,015 high-quality ones. After removing redundant reads, 28,982 transcripts were obtained. Total 145 alternative splicing events were predicted. Additionally, 12,753 SSRs and 16,205 coding sequences were identified based on SSR analysis. Furthermore, 24,031 transcripts were annotated in eight functional databases, and 4,198 lncRNAs were predicted. This is the first study to perform SMRT sequencing of the full-length transcriptome of A. hygrophila. The obtained transcriptome may facilitate further exploration of the genetic data of A. hygrophila and uncover the interactions between this insect and the ecosystem.


September 22, 2019

Transcription-associated mutation promotes RNA complexity in highly expressed genes – A major new source of selectable variation.

Alternatively spliced transcript isoforms are thought to play a critical role for functional diversity. However, the mechanism generating the enormous diversity of spliced transcript isoforms remains unknown, and its biological significance remains unclear. We analyzed transcriptomes in saker falcons, chickens, and mice to show that alternative splicing occurs more frequently, yielding more isoforms, in highly expressed genes. We focused on hemoglobin in the falcon, the most abundantly expressed genes in blood, finding that alternative splicing produces 10-fold more isoforms than expected from the number of splice junctions in the genome. These isoforms were produced mainly by alternative use of de novo splice sites generated by transcription-associated mutation (TAM), not by the RNA editing mechanism normally invoked. We found that high expression of globin genes increases mutation frequencies during transcription, especially on nontranscribed DNA strands. After DNA replication, transcribed strands inherit these somatic mutations, creating de novo splice sites, and generating multiple distinct isoforms in the cell clone. Bisulfate sequencing revealed that DNA methylation may counteract this process by suppressing TAM, suggesting DNA methylation can spatially regulate RNA complexity. RNA profiling showed that falcons living on the high Qinghai-Tibetan Plateau possess greater global gene expression levels and higher diversity of mean to high abundance isoforms (reads per kilobases per million mapped reads?=18) than their low-altitude counterparts, and we speculate that this may enhance their oxygen transport capacity under low-oxygen environments. Thus, TAM-induced RNA diversity may be physiologically significant, providing an alternative strategy in lifestyle evolution.


September 22, 2019

Genomic microdiversity of Bifidobacterium pseudocatenulatum underlying differential strain-level responses to dietary carbohydrate intervention.

The genomic basis of the response to dietary intervention of human gut beneficial bacteria remains elusive, which hinders precise manipulation of the microbiota for human health. After receiving a dietary intervention enriched with nondigestible carbohydrates for 105 days, a genetically obese child with Prader-Willi syndrome lost 18.4% of his body weight and showed significant improvement in his bioclinical parameters. We obtained five isolates (C1, C15, C55, C62, and C95) of one of the most abundantly promoted beneficial species, Bifidobacterium pseudocatenulatum, from a postintervention fecal sample. Intriguingly, these five B. pseudocatenulatum strains showed differential responses during the dietary intervention. Two strains were largely unaffected, while the other three were promoted to different extents by the changes in dietary carbohydrate resources. The differential responses of these strains were consistent with their functional clustering based on the COGs (Clusters of Orthologous Groups), including those involved with the ABC-type sugar transport systems, suggesting that the strain-specific genomic variations may have contributed to the niche adaption. Particularly, B. pseudocatenulatum C15, which had the most diverse types and highest gene copy numbers of carbohydrate-active enzymes targeting plant polysaccharides, had the highest abundance after the dietary intervention. These studies show the importance of understanding genomic diversity of specific members of the gut microbiota if precise nutrition approaches are to be realized.IMPORTANCE The manipulation of the gut microbiota via dietary approaches is a promising option for improving human health. Our findings showed differential responses of multiple B. pseudocatenulatum strains isolated from the same habitat to the dietary intervention, as well as strain-specific correlations with bioclinical parameters of the host. The comparative genomics revealed a genome-level microdiversity of related functional genes, which may have contributed to these differences. These results highlight the necessity of understanding strain-level differences if precise manipulation of gut microbiota through dietary approaches is to be realized. Copyright © 2017 Wu et al.


September 22, 2019

Single-molecule long-read sequencing facilitates shrimp transcriptome research.

Although shrimp are of great economic importance, few full-length shrimp transcriptomes are available. Here, we used Pacific Biosciences single-molecule real-time (SMRT) long-read sequencing technology to generate transcripts from the Pacific white shrimp (Litopenaeus vannamei). We obtained 322,600 full-length non-chimeric reads, from which we generated 51,367 high-quality unique full-length transcripts. We corrected errors in the SMRT sequences by comparison with Illumina-produced short reads. We successfully annotated 81.72% of all unique SMRT transcripts against the NCBI non-redundant database, 58.63% against Swiss-Prot, 45.38% against Gene Ontology, 32.57% against Clusters of Orthologous Groups of proteins (COG), and 47.83% against Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Across all transcripts, we identified 3,958 long non-coding RNAs (lncRNAs) and 80,650 simple sequence repeats (SSRs). Our study provides a rich set of full-length cDNA sequences for L. vannamei, which will greatly facilitate shrimp transcriptome research.


September 22, 2019

Long read reference genome-free reconstruction of a full-length transcriptome from Astragalus membranaceus reveals transcript variants involved in bioactive compound biosynthesis.

Astragalus membranaceus, also known as Huangqi in China, is one of the most widely used medicinal herbs in Traditional Chinese Medicine. Traditional Chinese Medicine formulations from Astragalus membranaceus have been used to treat a wide range of illnesses, such as cardiovascular disease, type 2 diabetes, nephritis and cancers. Pharmacological studies have shown that immunomodulating, anti-hyperglycemic, anti-inflammatory, antioxidant and antiviral activities exist in the extract of Astragalus membranaceus. Therefore, characterising the biosynthesis of bioactive compounds in Astragalus membranaceus, such as Astragalosides, Calycosin and Calycosin-7-O-ß-d-glucoside, is of particular importance for further genetic studies of Astragalus membranaceus. In this study, we reconstructed the Astragalus membranaceus full-length transcriptomes from leaf and root tissues using PacBio Iso-Seq long reads. We identified 27 975 and 22 343 full-length unique transcript models in each tissue respectively. Compared with previous studies that used short read sequencing, our reconstructed transcripts are longer, and are more likely to be full-length and include numerous transcript variants. Moreover, we also re-characterised and identified potential transcript variants of genes involved in Astragalosides, Calycosin and Calycosin-7-O-ß-d-glucoside biosynthesis. In conclusion, our study provides a practical pipeline to characterise the full-length transcriptome for species without a reference genome and a useful genomic resource for exploring the biosynthesis of active compounds in Astragalus membranaceus.


September 22, 2019

Assessment of an organ-specific de novo transcriptome of the nematode trap-crop, Solanum sisymbriifolium

Solanum sisymbriifolium, also known as “Litchi Tomato” or “Sticky Nightshade,” is an undomesticated and poorly researched plant related to potato and tomato. Unlike the latter species, S. sisymbriifolium induces eggs of the cyst nematode, Globodera pallida, to hatch and migrate into its roots, but then arrests further nematode maturation. In order to provide researchers with a partial blueprint of its genetic make-up so that the mechanism of this response might be identified, we used single molecule real time (SMRT) sequencing to compile a high quality de novo transcriptome of 41,189 unigenes drawn from individually sequenced bud, root, stem, and leaf RNA populations. Functional annotation and BUSCO analysis showed that this transcriptome was surprisingly complete, even though it represented genes expressed at a single time point. By sequencing the 4 organ libraries separately, we found we could get a reliable snapshot of transcript distributions in each organ. A divergent site analysis of the merged transcriptome indicated that this species might have undergone a recent genome duplication and re-diploidization. Further analysis indicated that the plant then retained a disproportionate number of genes associated with photosynthesis and amino acid metabolism in comparison to genes with characteristics of R-proteins or involved in secondary metabolism. The former processes may have given S. sisymbriifolium a bigger competitive advantage than the latter did. Copyright © 2018 Wixom et al.


September 22, 2019

Single molecule RNA sequencing uncovers trans-splicing and improves annotations in Anopheles stephensi.

Single molecule real-time (SMRT) sequencing has recently been used to obtain full-length cDNA sequences that improve genome annotation and reveal RNA isoforms. Here, we used one such method called isoform sequencing from Pacific Biosciences (PacBio) to sequence a cDNA library from the Asian malaria mosquito Anopheles stephensi. More than 600 000 full-length cDNAs, referred to as reads of insert, were identified. Owing to the inherently high error rate of PacBio sequencing, we tested different approaches for error correction. We found that error correction using Illumina RNA sequencing (RNA-seq) generated more data than using the default SMRT pipeline. The full-length error-corrected PacBio reads greatly improved the gene annotation of Anopheles stephensi: 4867 gene models were updated and 1785 alternatively spliced isoforms were added to the annotation. In addition, six trans-splicing events, where exons from different primary transcripts were joined together, were identified in An. stephensi. All six trans-splicing events appear to be conserved in Culicidae, as they are also found in Anopheles gambiae and Aedes aegypti. The proteins encoded by trans-splicing events are also highly conserved and the orthologues of these proteins are cis-spliced in outgroup species, indicating that trans-splicing may arise as a mechanism to rescue genes that broke up during evolution.© 2017 The Royal Entomological Society.


September 22, 2019

Resolving the complexity of human skin metagenomes using single-molecule sequencing.

Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation.The species comprising a microbial community are often difficult to deconvolute due to technical limitations inherent to most short-read sequencing technologies. Here, we leverage new advances in sequencing technology, single-molecule sequencing, to significantly improve reconstruction of a complex human skin microbial community. With this long-read technology, we were able to reconstruct and annotate a closed, high-quality genome of a previously uncharacterized skin species. We demonstrate that hybrid approaches with short-read technology are sufficiently powerful to reconstruct even single-nucleotide polymorphism level variation of species in this a community. Copyright © 2016 Tsai et al.


September 22, 2019

A new standard for crustacean genomes: The highly contiguous, annotated genome assembly of the clam shrimp Eulimnadia texana reveals HOX gene order and identifies the sex chromosome.

Vernal pool clam shrimp (Eulimnadia texana) are a promising model system due to their ease of lab culture, short generation time, modest sized genome, a somewhat rare stable androdioecious sex determination system, and a requirement to reproduce via desiccated diapaused eggs. We generated a highly contiguous genome assembly using 46× of PacBio long read data and 216× of Illumina short reads, and annotated using Illumina RNAseq obtained from adult males or hermaphrodites. Of the 120?Mb genome 85% is contained in the largest eight contigs, the smallest of which is 4.6?Mb. The assembly contains 98% of transcripts predicted via RNAseq. This assembly is qualitatively different from scaffolded Illumina assemblies: It is produced from long reads that contain sequence data along their entire length, and is thus gap free. The contiguity of the assembly allows us to order the HOX genes within the genome, identifying two loci that contain HOX gene orthologs, and which approximately maintain the order observed in other arthropods. We identified a partial duplication of the Antennapedia complex adjacent to the few genes homologous to the Bithorax locus. Because the sex chromosome of an androdioecious species is of special interest, we used existing allozyme and microsatellite markers to identify the E. texana sex chromosome, and find that it comprises nearly half of the genome of this species. Linkage patterns indicate that recombination is extremely rare and perhaps absent in hermaphrodites, and as a result the location of the sex determining locus will be difficult to refine using recombination mapping.© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


September 22, 2019

Pacbio sequencing of copper-tolerant Xanthomonas citri reveals presence of a chimeric plasmid structure and provides insights into reassortment and shuffling of transcription activator-like effectors among X. citri strains.

Xanthomonas citri, a causal agent of citrus canker, has been a well-studied model system due to recent availability of whole genome sequences of multiple strains from different geographical regions. Major limitations in our understanding of the evolution of pathogenicity factors in X. citri strains sequenced by short-read sequencing methods have been tracking plasmid reshuffling among strains due to inability to accurately assign reads to plasmids, and analyzing repeat regions among strains. X. citri harbors major pathogenicity determinants, including variable DNA-binding repeat region containing Transcription Activator-like Effectors (TALEs) on plasmids. The long-read sequencing method, PacBio, has allowed the ability to obtain complete and accurate sequences of TALEs in xanthomonads. We recently sequenced Xanthomonas citri str. Xc-03-1638-1-1, a copper tolerant A group strain isolated from grapefruit in 2003 from Argentina using PacBio RS II chemistry. We analyzed plasmid profiles, copy number and location of TALEs in complete genome sequences of X. citri strains.We utilized the power of long reads obtained by PacBio sequencing to enable assembly of a complete genome sequence of strain Xc-03-1638-1-1, including sequences of two plasmids, 249 kb (plasmid harboring copper resistance genes) and 99 kb (pathogenicity plasmid containing TALEs). The pathogenicity plasmid in this strain is a hybrid plasmid containing four TALEs. Due to the intriguing nature of this pathogenicity plasmid with Tn3-like transposon association, repetitive elements and multiple putative sites for origins of replication, we might expect alternative structures of this plasmid in nature, illustrating the strong adaptive potential of X. citri strains. Analysis of the pathogenicity plasmid among completely sequenced X. citri strains, coupled with Southern hybridization of the pathogenicity plasmids, revealed clues to rearrangements of plasmids and resulting reshuffling of TALEs among strains.We demonstrate in this study the importance of long-read sequencing for obtaining intact sequences of TALEs and plasmids, as well as for identifying rearrangement events including plasmid reshuffling. Rearrangement events, such as the hybrid plasmid in this case, could be a frequent phenomenon in the evolution of X. citri strains, although so far it is undetected due to the inability to obtain complete plasmid sequences with short-read sequencing methods.


September 22, 2019

Complete genome sequencing of the luminescent bacterium, Vibrio qinghaiensis sp. Q67 using PacBio technology.

Vibrio qinghaiensis sp.-Q67 (Vqin-Q67) is a freshwater luminescent bacterium that continuously emits blue-green light (485?nm). The bacterium has been widely used for detecting toxic contaminants. Here, we report the complete genome sequence of Vqin-Q67, obtained using third-generation PacBio sequencing technology. Continuous long reads were attained from three PacBio sequencing runs and reads >500?bp with a quality value of >0.75 were merged together into a single dataset. This resultant highly-contiguous de novo assembly has no genome gaps, and comprises two chromosomes with substantial genetic information, including protein-coding genes, non-coding RNA, transposon and gene islands. Our dataset can be useful as a comparative genome for evolution and speciation studies, as well as for the analysis of protein-coding gene families, the pathogenicity of different Vibrio species in fish, the evolution of non-coding RNA and transposon, and the regulation of gene expression in relation to the bioluminescence of Vqin-Q67.


September 22, 2019

Extreme haplotype variation in the desiccation-tolerant clubmoss Selaginella lepidophylla.

Plant genome size varies by four orders of magnitude, and most of this variation stems from dynamic changes in repetitive DNA content. Here we report the small 109?Mb genome of Selaginella lepidophylla, a clubmoss with extreme desiccation tolerance. Single-molecule sequencing enables accurate haplotype assembly of a single heterozygous S. lepidophylla plant, revealing extensive structural variation. We observe numerous haplotype-specific deletions consisting of largely repetitive and heavily methylated sequences, with enrichment in young Gypsy LTR retrotransposons. Such elements are active but rapidly deleted, suggesting “bloat and purge” to maintain a small genome size. Unlike all other land plant lineages, Selaginella has no evidence of a whole-genome duplication event in its evolutionary history, but instead shows unique tandem gene duplication patterns reflecting adaptation to extreme drying. Gene expression changes during desiccation in S. lepidophylla mirror patterns observed across angiosperm resurrection plants.


September 22, 2019

Genomic diversity in the endosymbiotic bacterium Rhizobium leguminosarum.

Rhizobium leguminosarum bv. viciae is a soil a-proteobacterium that establishes a diazotrophic symbiosis with different legumes of the Fabeae tribe. The number of genome sequences from rhizobial strains available in public databases is constantly increasing, although complete, fully annotated genome structures from rhizobial genomes are scarce. In this work, we report and analyse the complete genome of R. leguminosarum bv. viciae UPM791. Whole genome sequencing can provide new insights into the genetic features contributing to symbiotically relevant processes such as bacterial adaptation to the rhizosphere, mechanisms for efficient competition with other bacteria, and the ability to establish a complex signalling dialogue with legumes, to enter the root without triggering plant defenses, and, ultimately, to fix nitrogen within the host. Comparison of the complete genome sequences of two strains of R. leguminosarum bv. viciae, 3841 and UPM791, highlights the existence of different symbiotic plasmids and a common core chromosome. Specific genomic traits, such as plasmid content or a distinctive regulation, define differential physiological capabilities of these endosymbionts. Among them, strain UPM791 presents unique adaptations for recycling the hydrogen generated in the nitrogen fixation process.


September 22, 2019

Comparative genomic analyses reveal the features for adaptation to nematodes in fungi.

Nematophagous (NP) fungi are ecologically important components of the soil microbiome in natural ecosystems. Esteya vermicola (Ev) has been reported as a NP fungus with a poorly understood evolutionary history and mechanism of adaptation to parasitism. Furthermore, NP fungal genomic basis of lifestyle was still unclear. We sequenced and annotated the Ev genome (34.2 Mbp) and integrated genetic makeup and evolution of pathogenic genes to investigate NP fungi. The results revealed that NP fungi had some abundant pathogenic genes corresponding to their niche. A number of gene families involved in pathogenicity were expanded, and some pathogenic orthologous genes underwent positive selection. NP fungi with diverse morphological features exhibit similarities of evolutionary convergence in attacking nematodes, but their genetic makeup and microscopic mechanism are different. Endoparasitic NP fungi showed similarity in large number of transporters and secondary metabolite coding genes. Noteworthy, expanded families of transporters and endo-beta-glucanase implied great genetic potential of Ev in quickly perturbing nematode metabolism and parasitic behavior. These results facilitate our understanding of NP fungal genomic features for adaptation to nematodes and lay a solid theoretical foundation for further research and application.© The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.