P4/C2 Archives

September 22, 2019

Isoform sequencing and state-of-art applications for unravelling complexity of plant transcriptomes

Single-molecule real-time (SMRT) sequencing developed by PacBio, also called third-generation sequencing (TGS), offers longer reads than the second-generation sequencing (SGS). Given its ability to obtain full-length transcripts without assembly, isoform sequencing (Iso-Seq) of transcriptomes by PacBio is advantageous for genome annotation, identification of novel genes and isoforms, as well as the discovery of long non-coding RNA (lncRNA). In addition, Iso-Seq gives access to the direct detection of alternative splicing, alternative polyadenylation (APA), gene fusion, and DNA modifications. Such applications of Iso-Seq facilitate the understanding of gene structure, post-transcriptional regulatory networks, and subsequently proteomic diversity. In this review, we summarize its applications in plant transcriptome study, specifically pointing out challenges associated with each step in the experimental design and highlight the development of bioinformatic pipelines. We aim to provide the community with an integrative overview and a comprehensive guidance to Iso-Seq, and thus to promote its applications in plant research.

September 22, 2019

No assembly required: Full-length MHC class I allele discovery by PacBio circular consensus sequencing.

Single-molecule real-time (SMRT) sequencing technology with the Pacific Biosciences (PacBio) RS II platform offers the potential to obtain full-length coding regions (~1100-bp) from MHC class I cDNAs. Despite the relatively high error rate associated with SMRT technology, high quality sequences can be obtained by circular consensus sequencing (CCS) due to the random nature of the error profile. In the present study we first validated the ability of SMRT-CCS to accurately identify class I transcripts in Mauritian-origin cynomolgus macaques (Macaca fascicularis) that have been characterized previously by cloning and Sanger-based sequencing as well as pyrosequencing approaches. We then applied this SMRT-CCS method to characterize 60 novel full-length class I transcript sequences expressed by a cohort of cynomolgus macaques from China. The SMRT-CCS method described here provides a straightforward protocol for characterization of unfragmented single-molecule cDNA transcripts that will potentially revolutionize MHC class I allele discovery in nonhuman primates and other species. Published by Elsevier Inc.

September 22, 2019

Next generation sequencing data of a defined microbial mock community.

Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here report the next generation metagenomic sequence data of a defined mock community (Mock Bacteria ARchaea Community; MBARC-26), composed of 23 bacterial and 3 archaeal strains with finished genomes. These strains span 10 phyla and 14 classes, a range of GC contents, genome sizes, repeat content and encompass a diverse abundance profile. Short read Illumina and long-read PacBio SMRT sequences of this mock community are described. These data represent a valuable resource for the scientific community, enabling extensive benchmarking and comparative evaluation of bioinformatics tools without the need to simulate data. As such, these data can aid in improving our current sequence data analysis toolkit and spur interest in the development of new tools.

September 22, 2019

High-resolution phylogenetic microbial community profiling.

Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structures at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake’s water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.

September 22, 2019

A survey of the sorghum transcriptome using single-molecule long reads.

Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ~11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.

September 22, 2019

Assembly and analysis of a qingke reference genome demonstrate its close genetic relation to modern cultivated barley.

Qingke, the local name of hulless barley in the Tibetan Plateau, is a staple food for Tibetans. The availability of its reference genome sequences could be useful for studies on breeding and molecular evolution. Taking advantage of the third-generation sequencer (PacBio), we de novo assembled a 4.84-Gb genome sequence of qingke, cv. Zangqing320 and anchored a 4.59-Gb sequence to seven chromosomes. Of the 46,787 annotated ‘high-confidence’ genes, 31 564 were validated by RNA-sequencing data of 39 wild and cultivated barley genotypes with wide genetic diversity, and the results were also confirmed by nonredundant protein database from NCBI. As some gaps in the reference genome of Morex were covered in the reference genome of Zangqing320 by PacBio reads, we believe that the Zangqing320 genome provides the useful supplements for the Morex genome. Using the qingke genome as a reference, we conducted a genome comparison, revealing a close genetic relationship between a hulled barley (cv. Morex) and a hulless barley (cv. Zangqing320), which is strongly supported by the low-diversity regions in the two genomes. Considering the origin of Morex from its breeding pedigree, we then demonstrated a close genomic relationship between modern cultivated barley and qingke. Given this genomic relationship and the large genetic diversity between qingke and modern cultivated barley, we propose that qingke could provide elite genes for barley improvement.© 2017 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

September 22, 2019

Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides.

The basidiomycete yeast Rhodosporidium toruloides (also known as Rhodotorula toruloides) accumulates high concentrations of lipids and carotenoids from diverse carbon sources. It has great potential as a model for the cellular biology of lipid droplets and for sustainable chemical production. We developed a method for high-throughput genetics (RB-TDNAseq), using sequence-barcoded Agrobacterium tumefaciens T-DNA insertions. We identified 1,337 putative essential genes with low T-DNA insertion rates. We functionally profiled genes required for fatty acid catabolism and lipid accumulation, validating results with 35 targeted deletion strains. We identified a high-confidence set of 150 genes affecting lipid accumulation, including genes with predicted function in signaling cascades, gene expression, protein modification and vesicular trafficking, autophagy, amino acid synthesis and tRNA modification, and genes of unknown function. These results greatly advance our understanding of lipid metabolism in this oleaginous species and demonstrate a general approach for barcoded mutagenesis that should enable functional genomics in diverse fungi.

September 22, 2019

The genome sequence of a new strain of Mycobacterium ulcerans ecovar Liflandii, emerging as a sturgeon pathogen

Mycobacterium ulcerans ecovar Liflandii (MuLiflandii) is emerging as a non-mycobacterial pathogen in amphibians. Here, we make the first report on the prevalence of a new strain of MuLiflandii infection in Chinese sturgeon. All the diseased fish showed the classic clinical symptoms of ascites and/or muscle ulceration. A new slow-growing and acid-fast bacillus ASM001 strain was obtained from the ascites of infected fish; this strain demonstrated pathogenicity when tested in hybrid sturgeon. The complete genome sequence of MuLiflandii ASM001 is a circular chromosome of 6,167,296?bp, with a G?+?C content of 65.57%, containing 4518 predicted coding DNA sequences and 999 pseudo-genes, 3 rRNA operons, and 47 transfer RNA sequences. In addition, we found 245 copies of IS2404, 34 microsatellites, and 36 CRISPR sequences in the whole MuLiflandii ASM001 genome. Among the predicted genes of MuLiflandii ASM001, we found orthologs of 203 virulence factors of clinical MuLiflandii 128FXT operating in host cell invasion, modulation of phagocyte function, and survival inside the macrophages. These virulence factor candidates provide a key basis for understanding their pathogenic mechanisms at the molecular level. A comparative analysis that used complete, existing genomes showed that MuLiflandii ASM001 has high synteny with MuLiflandii 128FXT. We anticipate the availability of the complete MuLiflandii ASM001 genome sequence will provide a valuable resource for comparative genomic studies of MuLiflandii isolates, as well as provide new insights into the host, ecological, and functional diversity of the genus Mycobacterium.

September 22, 2019

Epigenetic landscape influences the liver cancer genome architecture.

The accumulations of different types of genetic alterations such as nucleotide substitutions, structural rearrangements and viral genome integrations and epigenetic alterations contribute to carcinogenesis. Here, we report correlation between the occurrence of epigenetic features and genetic aberrations by whole-genome bisulfite, whole-genome shotgun, long-read, and virus capture sequencing of 373 liver cancers. Somatic substitutions and rearrangement breakpoints are enriched in tumor-specific hypo-methylated regions with inactive chromatin marks and actively transcribed highly methylated regions in the cancer genome. Individual mutation signatures depend on chromatin status, especially, signatures with a higher transcriptional strand bias occur within active chromatic areas. Hepatitis B virus (HBV) integration sites are frequently detected within inactive chromatin regions in cancer cells, as a consequence of negative selection for integrations in active chromatin regions. Ultra-high structural instability and preserved unmethylation of integrated HBV genomes are observed. We conclude that both precancerous and somatic epigenetic features contribute to the cancer genome architecture.

September 22, 2019

N6-methyladenine DNA modification in Xanthomonas oryzae pv. oryzicola genome.

DNA N6-methyladenine (6mA) modifications expand the information capacity of DNA and have long been known to exist in bacterial genomes. Xanthomonas oryzae pv. Oryzicola (Xoc) is the causative agent of bacterial leaf streak, an emerging and destructive disease in rice worldwide. However, the genome-wide distribution patterns and potential functions of 6mA in Xoc are largely unknown. In this study, we analyzed the levels and global distribution patterns of 6mA modification in genomic DNA of seven Xoc strains (BLS256, BLS279, CFBP2286, CFBP7331, CFBP7341, L8 and RS105). The 6mA modification was found to be widely distributed across the seven Xoc genomes, accounting for percent of 3.80, 3.10, 3.70, 4.20, 3.40, 2.10, and 3.10 of the total adenines in BLS256, BLS279, CFBP2286, CFBP7331, CFBP7341, L8, and RS105, respectively. Notably, more than 82% of 6mA sites were located within gene bodies in all seven strains. Two specific motifs for 6?mA modification, ARGT and AVCG, were prevalent in all seven strains. Comparison of putative DNA methylation motifs from the seven strains reveals that Xoc have a specific DNA methylation system. Furthermore, the 6?mA modification of rpfC dramatically decreased during Xoc infection indicates the important role for Xoc adaption to environment.

September 22, 2019

Mutators as drivers of adaptation in Streptococcus and a risk factor for host jumps and vaccine escape

Heritable hypermutable strains deficient in DNA repair genes (mutators) facilitate microbial adaptation as they may rapidly generate beneficial mutations. Mutators deficient in mismatch (MMR) and oxidised guanine (OG) repair are abundant in clinical samples and show increased adaptive potential in experimental infection models but their role in pathoadaptation is poorly understood. Here we investigate the role of mutators in epidemiology and evolution of the broad host pathogen, Streptococcus iniae, employing 80 strains isolated globally over 40 years. We determine phylogenetic relationship among S. iniae using 10,267 non-recombinant core genome single nucleotide polymorphisms (SNPs), estimate their mutation rate by fluctuation analysis, and detect variation in major MMR (mutS, mutL, dnaN, recD2, rnhC) and OG (mutY, mutM, mutX) genes. S. iniae mutation rate phenotype and genotype are strongly associated with phylogenetic diversification and variation in major streptococcal virulence determinants (capsular polysaccharide, hemolysin, cell chain length, resistance to oxidation, and biofilm formation). Furthermore, profound changes in virulence determinants observed in mammalian isolates (atypical host) and vaccine-escape isolates found in bone (atypical tissue) of vaccinated barramundi are linked to multiple MMR and OG variants and unique mutation rates. This implies that adaptation to new host taxa, new host tissue, and to immunity of a vaccinated host is promoted by mutator strains. Our findings support the importance of mutation rate dynamics in evolution of pathogenic bacteria, in particular adaptation to a drastically different immunological setting that occurs during host jump and vaccine escape events.Importance Host immune response is a powerful selective pressure that drives diversification of pathogenic microorganisms and, ultimately, evolution of new strains. Major adaptive events in pathogen evolution, such as transmission to a new host species or infection of vaccinated hosts, require adaptation to a drastically different immune landscape. Such adaptation may be favoured by hypermutable strains (or mutators) that are defective in normal DNA repair and consequently capable of generating multiple potentially beneficial and compensatory mutations. This permits rapid adjustment of virulence and antigenicity in a new immunological setting. Here we show that mutators, through mutations in DNA repair genes and corresponding shifts in mutation rate, are associated with major diversification events and virulence evolution in the broad host-range pathogen Streptococcus iniae. We show that mutators underpin infection of vaccinated hosts, transmission to new host species and the evolution of new strains.

July 19, 2019

Long-read, whole-genome shotgun sequence data for five model organisms.

Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.

July 19, 2019

Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing.

Single Molecule, Real-Time (SMRT(®)) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

July 19, 2019

HLA Class-II associated HIV polymorphisms predict escape from CD4+ T Cell responses.

Antiretroviral therapy, antibody and CD8+ T cell-mediated responses targeting human immunodeficiency virus-1 (HIV-1) exert selection pressure on the virus necessitating escape; however, the ability of CD4+ T cells to exert selective pressure remains unclear. Using a computational approach on HIV gag/pol/nef sequences and HLA-II allelic data, we identified 29 HLA-II associated HIV sequence polymorphisms or adaptations (HLA-AP) in an African cohort of chronically HIV-infected individuals. Epitopes encompassing the predicted adaptation (AE) or its non-adapted (NAE) version were evaluated for immunogenicity. Using a CD8-depleted IFN-? ELISpot assay, we determined that the magnitude of CD4+ T cell responses to the predicted epitopes in controllers was higher compared to non-controllers (p<0.0001). However, regardless of the group, the magnitude of responses to AE was lower as compared to NAE (p<0.0001). CD4+ T cell responses in patients with acute HIV infection (AHI) demonstrated poor immunogenicity towards AE as compared to NAE encoded by their transmitted founder virus. Longitudinal data in AHI off antiretroviral therapy demonstrated sequence changes that were biologically confirmed to represent CD4+ escape mutations. These data demonstrate an innovative application of HLA-associated polymorphisms to identify biologically relevant CD4+ epitopes and suggests CD4+ T cells are active participants in driving HIV evolution.

July 19, 2019

Lineage-specific methyltransferases define the methylome of the globally disseminated Escherichia coli ST131 clone.

Escherichia coli sequence type 131 (ST131) is a clone of uropathogenic E. coli that has emerged rapidly and disseminated globally in both clinical and community settings. Members of the ST131 lineage from across the globe have been comprehensively characterized in terms of antibiotic resistance, virulence potential, and pathogenicity, but to date nothing is known about the methylome of these important human pathogens. Here we used single-molecule real-time (SMRT) PacBio sequencing to determine the methylome of E. coli EC958, the most-well-characterized completely sequenced ST131 strain. Our analysis of 52,081 methylated adenines in the genome of EC958 discovered three (m6)A methylation motifs that have not been described previously. Subsequent SMRT sequencing of isogenic knockout mutants identified the two type I methyltransferases (MTases) and one type IIG MTase responsible for (m6)A methylation of novel recognition sites. Although both type I sites were rare, the type IIG sites accounted for more than 12% of all methylated adenines in EC958. Analysis of the distribution of MTase genes across 95 ST131 genomes revealed their prevalence is highly conserved within the ST131 lineage, with most variation due to the presence or absence of mobile genetic elements on which individual MTase genes are located.DNA modification plays a crucial role in bacterial regulation. Despite several examples demonstrating the role of methyltransferase (MTase) enzymes in bacterial virulence, investigation of this phenomenon on a whole-genome scale has remained elusive until now. Here we used single-molecule real-time (SMRT) sequencing to determine the first complete methylome of a strain from the multidrug-resistant E. coli sequence type 131 (ST131) lineage. By interrogating the methylome computationally and with further SMRT sequencing of isogenic mutants representing previously uncharacterized MTase genes, we defined the target sequences of three novel ST131-specific MTases and determined the genomic distribution of all MTase target sequences. Using a large collection of 95 previously sequenced ST131 genomes, we identified mobile genetic elements as a major factor driving diversity in DNA methylation patterns. Overall, our analysis highlights the potential for DNA methylation to dramatically influence gene regulation at the transcriptional level within a well-defined E. coli clone. Copyright © 2015 Forde et al.

Auto Tag: P4/C2

Isoform sequencing and state-of-art applications for unravelling complexity of plant transcriptomes

No assembly required: Full-length MHC class I allele discovery by PacBio circular consensus sequencing.

Next generation sequencing data of a defined microbial mock community.

High-resolution phylogenetic microbial community profiling.

A survey of the sorghum transcriptome using single-molecule long reads.

Assembly and analysis of a qingke reference genome demonstrate its close genetic relation to modern cultivated barley.

Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides.

The genome sequence of a new strain of Mycobacterium ulcerans ecovar Liflandii, emerging as a sturgeon pathogen

Epigenetic landscape influences the liver cancer genome architecture.

N6-methyladenine DNA modification in Xanthomonas oryzae pv. oryzicola genome.

Mutators as drivers of adaptation in Streptococcus and a risk factor for host jumps and vaccine escape

Long-read, whole-genome shotgun sequence data for five model organisms.

Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing.

HLA Class-II associated HIV polymorphisms predict escape from CD4+ T Cell responses.

Lineage-specific methyltransferases define the methylome of the globally disseminated Escherichia coli ST131 clone.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert