Menu
September 22, 2019  |  

High-confidence coding and noncoding transcriptome maps.

The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes.© 2017 You et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019  |  

Recurrent structural variation, clustered sites of selection, and disease risk for the complement factor H (CFH) gene family.

Structural variation and single-nucleotide variation of the complement factor H (CFH) gene family underlie several complex genetic diseases, including age-related macular degeneration (AMD) and atypical hemolytic uremic syndrome (AHUS). To understand its diversity and evolution, we performed high-quality sequencing of this ~360-kbp locus in six primate lineages, including multiple human haplotypes. Comparative sequence analyses reveal two distinct periods of gene duplication leading to the emergence of four CFH-related (CFHR) gene paralogs (CFHR2 and CFHR4 ~25-35 Mya and CFHR1 and CFHR3 ~7-13 Mya). Remarkably, all evolutionary breakpoints share a common ~4.8-kbp segment corresponding to an ancestral CFHR gene promoter that has expanded independently throughout primate evolution. This segment is recurrently reused and juxtaposed with a donor duplication containing exons 8 and 9 from ancestral CFH, creating four CFHR fusion genes that include lineage-specific members of the gene family. Combined analysis of >5,000 AMD cases and controls identifies a significant burden of a rare missense mutation that clusters at the N terminus of CFH [P = 5.81 × 10-8, odds ratio (OR) = 9.8 (3.67-Infinity)]. A bipolar clustering pattern of rare nonsynonymous mutations in patients with AMD (P < 10-3) and AHUS (P = 0.0079) maps to functional domains that show evidence of positive selection during primate evolution. Our structural variation analysis in >2,400 individuals reveals five recurrent rearrangement breakpoints that show variable frequency among AMD cases and controls. These data suggest a dynamic and recurrent pattern of mutation critical to the emergence of new CFHR genes but also in the predisposition to complex human genetic disease phenotypes.


September 22, 2019  |  

Analysis of gut microbiota – An ever changing landscape.

In the last two decades, the field of metagenomics has greatly expanded due to improvement in sequencing technologies allowing for a more comprehensive characterization of microbial communities. The use of these technologies has led to an unprecedented understanding of human, animal, and environmental microbiomes and have shown that the gut microbiota are comparable to an organ that is intrinsically linked with a variety of diseases. Characterization of microbial communities using next-generation sequencing-by-synthesis approaches have revealed important shifts in microbiota associated with debilitating diseases such as Clostridium difficile infection. But due to limitations in sequence read length, primer biases, and the quality of databases, genus- and species-level classification have been difficult. Third-generation technologies, such as Pacific Biosciences’ single molecule, real-time (SMRT) approach, allow for unbiased, more specific identification of species that are likely clinically relevant. Comparison of Illumina next-generation characterization and SMRT sequencing of samples from patients treated for C. difficile infection revealed similarities in community composition at the phylum and family levels, but SMRT sequencing further allowed for species-level characterization – permitting a better understanding of the microbial ecology of this disease. Thus, as sequencing technologies continue to advance, new species-level insights can be gained in the study of complex and clinically-relevant microbial communities.


September 22, 2019  |  

Saliva and tooth biofilm bacterial microbiota in adolescents in a low caries community.

The oral cavity harbours a complex microbiome that is linked to dental diseases and serves as a route to other parts of the body. Here, the aims were to characterize the oral microbiota by deep sequencing in a low-caries population with regular dental care since childhood and search for association with caries prevalence and incidence. Saliva and tooth biofilm from 17-year-olds and mock bacteria communities were analysed using 16S rDNA Illumina MiSeq (v3-v4) and PacBio SMRT (v1-v8) sequencing including validity and reliability estimates. Caries was scored at 17 and 19 years of age. Both sequencing platforms revealed that Firmicutes dominated in the saliva, whereas Firmicutes and Actinobacteria abundances were similar in tooth biofilm. Saliva microbiota discriminated caries-affected from caries-free adolescents, with enumeration of Scardovia wiggsiae, Streptococcus mutans, Bifidobacterium longum, Leptotrichia sp. HOT498, and Selenomonas spp. in caries-affected participants. Adolescents with B. longum in saliva had significantly higher 2-year caries increment. PacBio SMRT revealed Corynebacterium matruchotii as the most prevalent species in tooth biofilm. In conclusion, both sequencing methods were reliable and valid for oral samples, and saliva microbiota was associated with cross-sectional caries prevalence, especially S. wiggsiae, S. mutans, and B. longum; the latter also with the 2-year caries incidence.


September 22, 2019  |  

wtf genes are prolific dual poison-antidote meiotic drivers.

Meiotic drivers are selfish genes that bias their transmission into gametes, defying Mendelian inheritance. Despite the significant impact of these genomic parasites on evolution and infertility, few meiotic drive loci have been identified or mechanistically characterized. Here, we demonstrate a complex landscape of meiotic drive genes on chromosome 3 of the fission yeasts Schizosaccharomyces kambucha and S. pombe. We identify S. kambucha wtf4 as one of these genes that acts to kill gametes (known as spores in yeast) that do not inherit the gene from heterozygotes. wtf4 utilizes dual, overlapping transcripts to encode both a gamete-killing poison and an antidote to the poison. To enact drive, all gametes are poisoned, whereas only those that inherit wtf4 are rescued by the antidote. Our work suggests that the wtf multigene family proliferated due to meiotic drive and highlights the power of selfish genes to shape genomes, even while imposing tremendous costs to fertility.


September 22, 2019  |  

Rodent papillomaviruses.

Preclinical infection model systems are extremely valuable tools to aid in our understanding of Human Papillomavirus (HPV) biology, disease progression, prevention, and treatments. In this context, rodent papillomaviruses and their respective infection models are useful tools but remain underutilized resources in the field of papillomavirus biology. Two rodent papillomaviruses, MnPV1, which infects the Mastomys species of multimammate rats, and MmuPV1, which infects laboratory mice, are currently the most studied rodent PVs. Both of these viruses cause malignancy in the skin and can provide attractive infection models to study the lesser understood cutaneous papillomaviruses that have been frequently associated with HPV-related skin cancers. Of these, MmuPV1 is the first reported rodent papillomavirus that can naturally infect the laboratory strain of mice. MmuPV1 is an attractive model virus to study papillomavirus pathogenesis because of the ubiquitous availability of lab mice and the fact that this mouse species is genetically modifiable. In this review, we have summarized the knowledge we have gained about PV biology from the study of rodent papillomaviruses and point out the remaining gaps that can provide new research opportunities.


September 22, 2019  |  

Revertant mosaicism repairs skin lesions in a patient with keratitis-ichthyosis-deafness syndrome by second-site mutations in connexin 26.

Revertant mosaicism (RM) is a naturally occurring phenomenon where the pathogenic effect of a germline mutation is corrected by a second somatic event. Development of healthy-looking skin due to RM has been observed in patients with various inherited skin disorders, but not in connexin-related disease. We aimed to clarify the underlying molecular mechanisms of suspected RM in the skin of a patient with keratitis-ichthyosis-deafness (KID) syndrome. The patient was diagnosed with KID syndrome due to characteristic skin lesions, hearing deficiency and keratitis. Investigation of GJB2 encoding connexin (Cx) 26 revealed heterozygosity for the recurrent de novo germline mutation, c.148G?>?A, p.Asp50Asn. At age 20, the patient developed spots of healthy-looking skin that grew in size and number within widespread erythrokeratodermic lesions. Ultra-deep sequencing of two healthy-looking skin biopsies identified five somatic nonsynonymous mutations, independently present in cis with the p.Asp50Asn mutation. Functional studies of Cx26 in HeLa cells revealed co-expression of Cx26-Asp50Asn and wild-type Cx26 in gap junction channel plaques. However, Cx26-Asp50Asn with the second-site mutations identified in the patient displayed no formation of gap junction channel plaques. We argue that the second-site mutations independently inhibit Cx26-Asp50Asn expression in gap junction channels, reverting the dominant negative effect of the p.Asp50Asn mutation. To our knowledge, this is the first time RM has been reported to result in the development of healthy-looking skin in a patient with KID syndrome. © The Author 2017. Published by Oxford University Press.


September 22, 2019  |  

Influenza virus infection causes global RNAPII termination defects.

Viral infection perturbs host cells and can be used to uncover regulatory mechanisms controlling cellular responses and susceptibility to infections. Using cell biological, biochemical, and genetic tools, we reveal that influenza A virus (IAV) infection induces global transcriptional defects at the 3′ ends of active host genes and RNA polymerase II (RNAPII) run-through into extragenic regions. Deregulated RNAPII leads to expression of aberrant RNAs (3′ extensions and host-gene fusions) that ultimately cause global transcriptional downregulation of physiological transcripts, an effect influencing antiviral response and virulence. This phenomenon occurs with multiple strains of IAV, is dependent on influenza NS1 protein, and can be modulated by SUMOylation of an intrinsically disordered region (IDR) of NS1 expressed by the 1918 pandemic IAV strain. Our data identify a strategy used by IAV to suppress host gene expression and indicate that polymorphisms in IDRs of viral proteins can affect the outcome of an infection.


September 22, 2019  |  

Analysis of RNA base modification and structural rearrangement by single-molecule real-time detection of reverse transcription.

Zero-mode waveguides (ZMWs) are photonic nanostructures that create highly confined optical observation volumes, thereby allowing single-molecule-resolved biophysical studies at relatively high concentrations of fluorescent molecules. This principle has been successfully applied in single-molecule, real-time (SMRT®) DNA sequencing for the detection of DNA sequences and DNA base modifications. In contrast, RNA sequencing methods cannot provide sequence and RNA base modifications concurrently as they rely on complementary DNA (cDNA) synthesis by reverse transcription followed by sequencing of cDNA. Thus, information on RNA modifications is lost during the process of cDNA synthesis.Here we describe an application of SMRT technology to follow the activity of reverse transcriptase enzymes synthesizing cDNA on thousands of single RNA templates simultaneously in real time with single nucleotide turnover resolution using arrays of ZMWs. This method thereby obtains information from the RNA template directly. The analysis of the kinetics of the reverse transcriptase can be used to identify RNA base modifications, shown by example for N6-methyladenine (m6A) in oligonucleotides and in a specific mRNA extracted from total cellular mRNA. Furthermore, the real-time reverse transcriptase dynamics informs about RNA secondary structure and its rearrangements, as demonstrated on a ribosomal RNA and an mRNA template.Our results highlight the feasibility of studying RNA modifications and RNA structural rearrangements in ZMWs in real time. In addition, they suggest that technology can be developed for direct RNA sequencing provided that the reverse transcriptase is optimized to resolve homonucleotide stretches in RNA.


September 22, 2019  |  

Contrasting distribution patterns between aquatic and terrestrial Phytophthora species along a climatic gradient are linked to functional traits.

Diversity of microbial organisms is linked to global climatic gradients. The genus Phytophthora includes both aquatic and terrestrial plant pathogenic species that display a large variation of functional traits. The extent to which the physical environment (water or soil) modulates the interaction of microorganisms with climate is unknown. Here, we explored the main environmental drivers of diversity and functional trait composition of Phytophthora communities. Communities were obtained by a novel metabarcoding setup based on PacBio sequencing of river filtrates in 96 river sites along a geographical gradient. Species were classified as terrestrial or aquatic based on their phylogenetic clade. Overall, terrestrial and aquatic species showed contrasting patterns of diversity. For terrestrial species, precipitation was a stronger driver than temperature, and diversity and functional diversity decreased with decreasing temperature and precipitation. In cold and dry areas, the dominant species formed resistant structures and had a low optimum temperature. By contrast, for aquatic species, temperature and water chemistry were the strongest drivers, and diversity increased with decreasing temperature and precipitation. Within the same area, environmental filtering affected terrestrial species more strongly than aquatic species (20% versus 3% of the studied communities, respectively). Our results highlight the importance of functional traits and the physical environment in which microorganisms develop their life cycle when predicting their distribution under changing climatic conditions. Temperature and rainfall may be buffered differently by water and soil, and thus pose contrasting constrains to microbial assemblies.


September 22, 2019  |  

cDNA library enrichment of full length transcripts for SMRT long read sequencing.

The utility of genome assemblies does not only rely on the quality of the assembled genome sequence, but also on the quality of the gene annotations. The Pacific Biosciences Iso-Seq technology is a powerful support for accurate eukaryotic gene model annotation as it allows for direct readout of full-length cDNA sequences without the need for noisy short read-based transcript assembly. We propose the implementation of the TeloPrime Full Length cDNA Amplification kit to the Pacific Biosciences Iso-Seq technology in order to enrich for genuine full-length transcripts in the cDNA libraries. We provide evidence that TeloPrime outperforms the commonly used SMARTer PCR cDNA Synthesis Kit in identifying transcription start and end sites in Arabidopsis thaliana. Furthermore, we show that TeloPrime-based Pacific Biosciences Iso-Seq can be successfully applied to the polyploid genome of bread wheat (Triticum aestivum) not only to efficiently annotate gene models, but also to identify novel transcription sites, gene homeologs, splicing isoforms and previously unidentified gene loci.


September 22, 2019  |  

Gene activity in primary T cells infected with HIV89.6: intron retention and induction of genomic repeats.

HIV infection has been reported to alter cellular gene activity, but published studies have commonly assayed transformed cell lines and lab-adapted HIV strains, yielding inconsistent results. Here we carried out a deep RNA-Seq analysis of primary human T cells infected with the low passage HIV isolate HIV89.6.Seventeen percent of cellular genes showed altered activity 48 h after infection. In a meta-analysis including four other studies, our data differed from studies of HIV infection in cell lines but showed more parallels with infections of primary cells. We found a global trend toward retention of introns after infection, suggestive of a novel cellular response to infection. HIV89.6 infection was also associated with activation of several human endogenous retroviruses (HERVs) and retrotransposons, of interest as possible novel antigens that could serve as vaccine targets. The most highly activated group of HERVs was a subset of the ERV-9. Analysis showed that activation was associated with a particular variant of ERV-9 long terminal repeats that contains an indel near the U3-R border. These data also allowed quantification of >70 splice forms of the HIV89.6 RNA and specified the main types of chimeric HIV89.6-host RNAs. Comparison to over 100,000 integration site sequences from the same infected cell populations allowed quantification of authentic versus artifactual chimeric reads, showing that 5′ read-in, splicing out of HIV89.6 from the D4 donor and 3′ read-through were the most common HIV89.6-host cell chimeric RNA forms.Analysis of RNA abundance after infection of primary T cells with the low passage HIV89.6 isolate disclosed multiple novel features of HIV-host interactions, notably intron retention and induction of transcription of retrotransposons and endogenous retroviruses.


September 22, 2019  |  

Genome-wide identification and analysis of the ALTERNATIVE OXIDASE gene family in diploid and hexaploid wheat.

A comprehensive understanding of wheat responses to environmental stress will contribute to the long-term goal of feeding the planet. ALERNATIVE OXIDASE (AOX) genes encode proteins involved in a bypass of the electron transport chain and are also known to be involved in stress tolerance in multiple species. Here, we report the identification and characterization of the AOX gene family in diploid and hexaploid wheat. Four genes each were found in the diploid ancestors Triticum urartu, and Aegilops tauschii, and three in Aegilops speltoides. In hexaploid wheat (Triticum aestivum), 20 genes were identified, some with multiple splice variants, corresponding to a total of 24 proteins for those with observed transcription and translation. These proteins were classified as AOX1a, AOX1c, AOX1e or AOX1d via phylogenetic analysis. Proteins lacking most or all signature AOX motifs were assigned to putative regulatory roles. Analysis of protein-targeting sequences suggests mixed localization to the mitochondria and other organelles. In comparison to the most studied AOX from Trypanosoma brucei, there were amino acid substitutions at critical functional domains indicating possible role divergence in wheat or grasses in general. In hexaploid wheat, AOX genes were expressed at specific developmental stages as well as in response to both biotic and abiotic stresses such as fungal pathogens, heat and drought. These AOX expression patterns suggest a highly regulated and diverse transcription and expression system. The insights gained provide a framework for the continued and expanded study of AOX genes in wheat for stress tolerance through breeding new varieties, as well as resistance to AOX-targeted herbicides, all of which can ultimately be used synergistically to improve crop yield.


September 22, 2019  |  

Isoform sequencing provides a more comprehensive view of the Panax ginseng transcriptome.

Korean ginseng (Panax ginseng C.A. Meyer) has been widely used for medicinal purposes and contains potent plant secondary metabolites, including ginsenosides. To obtain transcriptomic data that offers a more comprehensive view of functional genomics in P. ginseng, we generated genome-wide transcriptome data from four different P. ginseng tissues using PacBio isoform sequencing (Iso-Seq) technology. A total of 135,317 assembled transcripts were generated with an average length of 3.2 kb and high assembly completeness. Of those unigenes, 67.5% were predicted to be complete full-length (FL) open reading frames (ORFs) and exhibited a high gene annotation rate. Furthermore, we successfully identified unique full-length genes involved in triterpenoid saponin synthesis and plant hormonal signaling pathways, including auxin and cytokinin. Studies on the functional genomics of P. ginseng seedlings have confirmed the rapid upregulation of negative feed-back loops by auxin and cytokinin signaling cues. The conserved evolutionary mechanisms in the auxin and cytokinin canonical signaling pathways of P. ginseng are more complex than those in Arabidopsis thaliana. Our analysis also revealed a more detailed view of transcriptome-wide alternative isoforms for 88 genes. Finally, transposable elements (TEs) were also identified, suggesting transcriptional activity of TEs in P. ginseng. In conclusion, our results suggest that long-read, full-length or partial-unigene data with high-quality assemblies are invaluable resources as transcriptomic references in P. ginseng and can be used for comparative analyses in closely related medicinal plants.


September 22, 2019  |  

A workflow for studying specialized metabolism in nonmodel eukaryotic organisms

Eukaryotes contain a diverse tapestry of specialized metabolites, many of which are of significant pharmaceutical and industrial importance to humans. Nevertheless, exploration of specialized metabolic pathways underlying specific chemical traits in nonmodel eukaryotic organisms has been technically challenging and historically lagged behind that of the bacterial systems. Recent advances in genomics, metabolomics, phylogenomics, and synthetic biology now enable a new workflow for interrogating unknown specialized metabolic systems in nonmodel eukaryotic hosts with greater efficiency and mechanistic depth. This chapter delineates such workflow by providing a collection of state-of-the-art approaches and tools, ranging from multiomics-guided candidate gene identification to in vitro and in vivo functional and structural characterization of specialized metabolic enzymes. As already demonstrated by several recent studies, this new workflow opens up a gateway into the largely untapped world of natural product biochemistry in eukaryotes. © 2016 Elsevier Inc. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.