Foodborne infections caused by lung flukes of the genus Paragonimus are a significant and widespread public health problem in tropical areas. Approximately 50 Paragonimus species have been reported to infect animals and humans, but Paragonimus westermani is responsible for the bulk of human disease. Despite their medical and economic importance, no genome sequence for any Paragonimus species is available.We sequenced and assembled the genome of P. westermani, which is among the largest of the known pathogen genomes with an estimated size of 1.1 Gb. A 922.8 Mb genome assembly was generated from Illumina and Pacific Biosciences (PacBio) sequence data, covering 84% of…
Group A Streptococcus (GAS) is a major cause of global infection-related morbidity and mortality. A modern controlled human infection model (CHIM) of GAS pharyngitis can accelerate vaccine development and pathogenesis research. A robust rationale for strain selection is central to meeting ethical, scientific, and regulatory requirements. Multifaceted characterization studies were done to compare a preferred candidate emm75 (M75) GAS strain to three other strains: an alternative candidate emm12 (M12) strain, an M1 strain used in 1970s pharyngitis CHIM studies (SS-496), and a representative (5448) of the globally disseminated M1T1 clone. A range of approaches were used to explore strain growth,…
Sucrose accumulation and decreased photosynthesis are early symptoms of yellow canopy syndrome (YCS) in sugarcane (Saccharum spp.), and precede the visual yellowing of the leaves. To investigate broad-scale gene expression changes during YCS-onset, transcriptome analyses coupled to metabolome analyses were performed. Across leaf tissues, the greatest number of differentially expressed genes related to the chloroplast, and the metabolic processes relating to nitrogen and carbohydrates. Five genes represented 90% of the TPM (Transcripts Per Million) associated with the downregulation of transcription during YCS-onset, which included PSII D1 (PsbA). This differential expression was consistent with a feedback regulatory effect upon photosynthesis. Broad-scale…
In 1995 a severe haemolytic-uremic syndrome (HUS) outbreak in Adelaide occurred. A recent genomic analysis of Shiga toxigenic Escherichia coli (STEC) O111:H- strains 95JB1 and 95NR1 from this outbreak found that the more virulent isolate, 95NR1, harboured two additional copies of the Shiga toxin 2 (Stx2) genes encoded within prophage regions. The structure of the Stx2-converting prophages could not be fully resolved using short-read sequence data alone and it was not clear if there were other genomic differences between 95JB1 and 95NR1. In this study we have used Pacific Biosciences (PacBio) single molecule real-time (SMRT) sequencing to characterise the genome…
Normalization of cDNA is widely used to improve the coverage of rare transcripts in analysis of transcriptomes employing next-generation sequencing. Recently, long-read technology has been emerging as a powerful tool for sequencing and construction of transcriptomes, especially for complex genomes containing highly similar transcripts and transcript-spliced isoforms. Here, we analyzed the transcriptome of sugarcane, with a highly polyploidy plant genome, by PacBio isoform sequencing (Iso-Seq) of two different cDNA library preparations, with and without a normalization step. The results demonstrated that, while the two libraries included many of the same transcripts, many longer transcripts were removed and many new generally…
Traditionally derived from fossil fuels, biological production of propionic acid has recently gained interest. Propionibacterium species produce propionic acid as their main fermentation product. Production of other organic acids reduces propionic acid yield and productivity, pointing to by-products gene-knockout strategies as a logical solution to increase yield. However, removing by-product formation has seen limited success due to our inability to genetically engineer the best producing strains (i.e. Propionibacterium acidipropionici). To overcome this limitation, random mutagenesis continues to be the best path towards improving strains for biological propionic acid production. Recent advances in next generation sequencing opened new avenues to understand…
Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715…
Despite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available. Most of the sugarcane transcriptomic studies have been based on Saccharum officinarum gene indices (SoGI), expressed sequence tags (ESTs) and de novo assembled transcript contigs from short-reads; hence knowledge of the sugarcane transcriptome is limited in relation to transcript length and number of transcript isoforms.The sugarcane transcriptome was sequenced using PacBio isoform sequencing (Iso-Seq) of a pooled RNA sample derived from leaf, internode and root tissues, of different developmental stages, from 22 varieties, to explore the potential for capturing full-length transcript…
About 64% of the total aboveground biomass in sugarcane production is from the culm, of which ~90% is present in fiber and sugars. Understanding the transcriptome in the sugarcane culm, and the transcripts that are associated with the accumulation of the sugar and fiber components would facilitate the modification of biomass composition for enhanced biofuel and biomaterial production. The Sugarcane Iso-Seq Transcriptome (SUGIT) database was used as a reference for RNA-Seq analysis of variation in gene expression between young and mature tissues, and between 10 genotypes with varying fiber content. Global expression analysis suggests that each genotype displayed a unique…
Despite the significance of chicken as a model organism, our understanding of the chicken transcriptome is limited compared to human. This issue is common to all non-human vertebrate annotations due to the difficulty in transcript identification from short read RNAseq data. While previous studies have used single molecule long read sequencing for transcript discovery, they did not perform RNA normalization and 5′-cap selection which may have resulted in lower transcriptome coverage and truncated transcript sequences.We sequenced normalised chicken brain and embryo RNA libraries with Pacific Bioscience Iso-Seq. 5′ cap selection was performed on the embryo library to provide methodological comparison.…
Arabica coffee (Coffea arabica) has a small gene pool limiting genetic improvement. Selection for caffeine content within this gene pool would be assisted by identification of the genes controlling this important trait. Sequencing of DNA bulks from 18 genotypes with extreme high- or low-caffeine content from a population of 232 genotypes was used to identify linked polymorphisms. To obtain a reference genome, a whole genome assembly of arabica coffee (variety K7) was achieved by sequencing using short read (Illumina) and long-read (PacBio) technology. Assembly was performed using a range of assembly tools resulting in 76 409 scaffolds with a scaffold N50…
Productivity of ruminant livestock depends on the rumen microbiota, which ferment indigestible plant polysaccharides into nutrients used for growth. Understanding the functions carried out by the rumen microbiota is important for reducing greenhouse gas production by ruminants and for developing biofuels from lignocellulose. We present 410 cultured bacteria and archaea, together with their reference genomes, representing every cultivated rumen-associated archaeal and bacterial family. We evaluate polysaccharide degradation, short-chain fatty acid production and methanogenesis pathways, and assign specific taxa to functions. A total of 336 organisms were present in available rumen metagenomic data sets, and 134 were present in human gut…
The human transcriptome is so large, diverse, and dynamic that, even after a decade of investigation by RNA sequencing (RNA-seq), we have yet to resolve its true dimensions. RNA-seq suffers from an expression-dependent bias that impedes characterization of low-abundance transcripts. We performed targeted single-molecule and short-read RNA-seq to survey the transcriptional landscape of a single human chromosome (Hsa21) at unprecedented resolution. Our analysis reaches the lower limits of the transcriptome, identifying a fundamental distinction between protein-coding and noncoding gene content: almost every noncoding exon undergoes alternative splicing, producing a seemingly limitless variety of isoforms. Analysis of syntenic regions of the…
Sugarcane biomass has been used for sugar, bioenergy and biomaterial production. The majority of the sugarcane biomass comes from the culm, which makes it important to understand the genetic control of biomass production in this part of the plant. A meta-transcriptome of the culm was obtained in an earlier study by using about one billion paired-end (150 bp) reads of deep RNA sequencing of samples from 20 diverse sugarcane genotypes and combining de novo assemblies from different assemblers and different settings. Although many genes could be recovered, this resulted in a large combined assembly which created the need for clustering…
Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structures…