Menu
April 21, 2020

Integrative functional genomics decodes herpes simplex virus 1

Since the genome of herpes simplex virus 1 (HSV-1) was first sequenced more than 30 years ago, its predicted 80 genes have been intensively studied. Here, we unravel the complete viral transcriptome and translatome during lytic infection with base-pair resolution by computational integration of multi-omics data. We identified a total of 201 viral transcripts and 284 open reading frames (ORFs) including all known and 46 novel large ORFs. Multiple transcript isoforms expressed from individual gene loci explain translation of the vast majority of novel viral ORFs as well as N-terminal extensions (NTEs) and truncations thereof. We show that key viral regulators and structural proteins possess NTEs, which initiate from non-canonical start codons and govern subcellular protein localization and packaging. We validated a novel non-canonical large spliced ORF in the ICP0 locus and identified a 93 aa ORF overlapping ICP34.5 that is thus also deleted in the FDA-approved oncolytic virus Imlygic. Finally, we extend the current nomenclature to include all novel viral gene products. Taken together, this work provides a valuable resource for future functional studies, vaccine design and oncolytic therapies.


April 21, 2020

Long metabarcoding of the eukaryotic rDNA operon to phylogenetically and taxonomically resolve environmental diversity

High-throughput environmental DNA metabarcoding has revolutionized the analysis of microbial diversity, but this approach is generally restricted to amplicon sizes below 500 base pairs. These short regions contain limited phylogenetic signal, which makes it impractical to use environmental DNA in full phylogenetic inferences. However, new long-read sequencing technologies such as the Pacific Biosciences platform may provide sufficiently large sequence lengths to overcome the poor phylogenetic resolution of short amplicons. To test this idea, we amplified soil DNA and used PacBio Circular Consensus Sequencing (CCS) to obtain a ~4500 bp region of the eukaryotic rDNA operon spanning most of the small (18S) and large subunit (28S) ribosomal RNA genes. The CCS reads were first treated with a novel curation workflow that generated 650 high-quality OTUs containing the physically linked 18S and 28S regions of the long amplicons. In order to assign taxonomy to these OTUs, we developed a phylogeny-aware approach based on the 18S region that showed greater accuracy and sensitivity than similarity-based and phylogenetic placement-based methods using shorter reads. The taxonomically-annotated OTUs were then combined with available 18S and 28S reference sequences to infer a well-resolved phylogeny spanning all major groups of eukaryotes, allowing to accurately derive the evolutionary origin of environmental diversity. A total of 1019 sequences were included, of which a majority (58%) corresponded to the new long environmental CCS reads. Comparisons to the 18S-only region of our amplicons revealed that the combined 18S-28S genes globally increased the phylogenetic resolution, recovering specific groupings otherwise missing. The long-reads also allowed to directly investigate the relationships among environmental sequences themselves, which represents a key advantage over the placement of short reads on a reference phylogeny. Altogether, our results show that long amplicons can be treated in a full phylogenetic framework to provide greater taxonomic resolution and a robust evolutionary perspective to environmental DNA.


April 21, 2020

Hemimetabolous insects elucidate the origin of sexual development via alternative splicing

Insects are the only animals in which sexual differentiation is controlled by sex-specific RNA splicing. The doublesex (dsx) transcription factor produces distinct male and female protein isoforms (DsxM and DsxF) under the control of the RNA splicing factor transformer (tra). tra itself is also alternatively spliced so that a functional Tra protein is only present in females; thus, DsxM is produced by default, while DsxF expression requires Tra. The sex-specific Dsx isoforms are essential for both male and female sexual differentiation. This pathway is profoundly different from the molecular mechanisms that control sex-specific development in other animal groups. In animals as different as vertebrates, nematodes, and crustaceans, sexual differentiation involves male-specific transcription of dsx-related transcription factors that are not alternatively spliced and play no role in female sexual development. To understand how the unique splicing-based mode of sexual differentiation found in insects evolved from a more ancestral transcription-based mechanism, we examined dsx and tra expression in three basal, hemimetabolous insect orders. We find that functional Tra protein is limited to females in the kissing bug Rhodnius prolixus (Hemiptera), but is present in both sexes in the louse Pediculus humanus (Phthiraptera) and the cockroach Blattella germanica (Blattodea). Although alternatively spliced dsx isoforms are seen in all these insects, they are sex-specific in the cockroach and the kissing bug but not in the louse. In B. germanica, RNAi experiments show that dsx is necessary for male, but not female, sexual differentiation, while tra controls female development via a dsx-independent pathway. Our results suggest that the distinctive insect mechanism based on the tra-dsx splicing cascade evolved in a gradual, mosaic process: sex-specific splicing of dsx predates its role in female sexual differentiation, while the role of tra in regulating dsx splicing and in sexual development more generally predates sex-specific expression of the Tra protein. We present a model where the canonical tra-dsx axis originated via merger between expanding dsx function (from males to both sexes) and narrowing tra function (from a general splicing factor to the dedicated regulator of dsx).


April 21, 2020

Trochodendron aralioides, the first chromosome-level draft genome in Trochodendrales and a valuable resource for basal eudicot research

Background The wheel tree (Trochodendron aralioides) is one of only two species in the basal eudicot order Trochodendrales. Together with Tetracentron sinense, the family is unique in having secondary xylem without vessel elements, long considered to be a primitive character also found in Amborella and Winteraceae. Recent studies however have shown that Trochodendraceae belong to basal eudicots and demonstrate this represents an evolutionary reversal for the group. Trochodendron aralioides is widespread in cultivation and popular for use in gardens and parks. Findings We assembled the T. aralioides genome using a total of 679.56 Gb of clean reads that were generated using both PacBio and Illumina short-reads in combination with 10XGenomics and Hi-C data. Nineteen scaffolds corresponding to 19 chromosomes were assembled to a final size of 1.614 Gb with a scaffold N50 of 73.37 Mb in addition to 1,534 contigs. Repeat sequences accounted for 64.226% of the genome, and 35,328 protein-coding genes with an average of 5.09 exons per gene were annotated using de novo, RNA-seq, and homology-based approaches. According to a phylogenetic analysis of protein-coding genes, T. aralioides diverged in a basal position relatively to core eudicots, approximately 121.8-125.8 million years ago. Conclusions Trochodendron aralioides is the first chromosome-scale genome assembled in the order Trochodendrales. It represents the largest genome assembled to date in the basal eudicot grade, as well as the closest order relative to the core-eudicots, as the position of Buxales remains unresolved. This genome will support further studies of wood morphology and floral evolution, and will be an essential resource for understanding rapid changes that took place at the base of the Eudicot tree. Finally, it can serve as a valuable source to aid both the acceleration of genome-assisted improvement for cultivation and conservation efforts of the wheel tree.


April 21, 2020

Reduced chromatin accessibility underlies gene expression differences in homologous chromosome arms of hexaploid wheat and diploid Aegilops tauschii

Polyploidy has been centrally important in driving the evolution of plants, and leads to alterations in gene expression that are thought to underlie the emergence of new traits. Despite the common occurrence of these global patterns of altered gene expression in polyploids, the mechanisms involved are not well understood. Using a precise framework of highly conserved syntenic genes on hexaploid wheat chromosome 3DL and its progenitor 3L chromosome arm of diploid Aegilops tauschii, we show that 70% of these genes exhibited proportionally reduced gene expression, in which expression in the hexaploid context of the 3DL genes was approximately 40% of the levels observed in diploid Ae. tauschii. Many genes showing elevated expression during later stages of grain development in wheat compared to Ae. tauschii. Gene sequence and methylation differences accounted for only a few cases of differences in gene expression. In contrast, large scale patterns of reduced chromatin accessibility of genes in the hexaploid chromosome arm compared to its diploid progenitor were correlated with observed overall reduction in gene expression and differential gene expression. Therefore, that an overall reduction in accessible chromatin underlies the major differences in gene expression that results from polyploidization.


April 21, 2020

Convergent evolution of linked mating-type loci in basidiomycetes: an ancient fusion event that has stood the test of time

Sexual development is a key evolutionary innovation of eukaryotes. In many species, mating involves interaction between compatible mating partners that can undergo cell and nuclear fusion and subsequent steps of development including meiosis. Mating compatibility in fungi is governed by mating type determinants, which are localized at mating type (MAT) loci. In basidiomycetes, the ancestral state is hypothesized to be tetrapolar (bifactorial), with two genetically unlinked MAT loci containing homeodomain transcription factor genes (HD locus) and pheromone and pheromone receptor genes (P/R locus), respectively. Alleles at both loci must differ between mating partners for completion of sexual development. However, there are also basidiomycete species with bipolar (unifactorial) mating systems, which can arise through genomic linkage of the HD and P/R loci. In the order Tremellales, which is comprised of mostly yeast-like species, bipolarity is found only in the human pathogenic Cryptococcus species. Here, we describe the analysis of MAT loci from the Trichosporonales, a sister order to the Tremellales. We analyzed genome sequences from 29 strains that belong to 24 species, including two new genome sequences generated in this study. Interestingly, in all of the species analyzed, the MAT loci are fused and a single HD gene is present in each mating type. This is similar to the organization in the pathogenic Cryptococci, which also have linked MAT loci and carry only one HD gene per MAT locus instead of the usual two HD genes found in the vast majority of basidiomycetes. However, the HD and P/R allele combinations in the Trichosporonales are different from those in the pathogenic Cryptococcus species. The differences in allele combinations compared to the bipolar Cryptococci as well as the existence of tetrapolar Tremellales sister species suggest that fusion of the HD and P/R loci and differential loss of one of the two HD genes per MAT allele occurred independently in the Trichosporonales and pathogenic Cryptococci. This finding supports the hypothesis of convergent evolution at the molecular level towards fused mating-type regions in fungi, similar to previous findings in other fungal groups. Unlike the fused MAT loci in several other basidiomycete lineages though, the gene content and gene order within the fused MAT loci are highly conserved in the Trichosporonales, and there is no apparent suppression of recombination extending from the MAT loci to adjacent chromosomal regions, suggesting different mechanisms for the evolution of physically linked MAT loci in these groups.


April 21, 2020

Antibiotic production is organized by a division of labour in Streptomyces

One of the hallmark behaviors of social groups is division of labour, where different group members become specialized to carry out complementary tasks. By dividing labour, cooperative groups of individuals increase their efficiency, thereby raising group fitness even if these specialized behaviors reduce the fitness of individual group members. Here we provide evidence that antibiotic production in colonies of the multicellular bacterium Streptomyces coelicolor is coordinated by a division of labour. We show that S. coelicolor colonies are genetically heterogeneous due to massive amplifications and deletions to the chromosome. Cells with gross chromosomal changes produce an increased diversity of secondary metabolites and secrete significantly more antibiotics; however, these changes come at the cost of dramatically reduced individual fitness, providing direct evidence for a trade-off between secondary metabolite production and fitness. Finally, we show that colonies containing mixtures of mutant strains and their parents produce significantly more antibiotics, while colony-wide spore production remains unchanged. Our work demonstrates that by generating mutants that are specialized to hyper-produce antibiotics, streptomycetes reduce the colony-wide fitness costs of secreted secondary metabolites while maximizing the yield and diversity of these products.


April 21, 2020

Neighbor predation linked to natural competence fosters the transfer of large genomic regions in Vibrio cholerae.

Natural competence for transformation is a primary mode of horizontal gene transfer. Competent bacteria are able to absorb free DNA from their surroundings and exchange this DNA against pieces of their own genome when sufficiently homologous. However, the prevalence of non-degraded DNA with sufficient coding capacity is not well understood. In this context, we previously showed that naturally competent Vibrio cholerae use their type VI secretion system (T6SS) to actively acquire DNA from non-kin neighbors. Here, we explored the conditions of the DNA released through T6SS-mediated killing versus passive cell lysis and the extent of the transfers that occur due to these conditions. We show that competent V. cholerae acquire DNA fragments with a length exceeding 150 kbp in a T6SS-dependent manner. Collectively, our data support the notion that the environmental lifestyle of V. cholerae fosters the exchange of genetic material with sufficient coding capacity to significantly accelerate bacterial evolution. © 2019, Matthey et al.


April 21, 2020

Identification and characterization of OmpT-like proteases in uropathogenic Escherichia coli clinical isolates

Bacterial colonization of the urogenital tract is limited by innate defenses, including the production of antimicrobial peptides (AMPs). Uropathogenic Escherichia coli (UPEC) resist AMP-killing to cause a range of urinary tract infections (UTIs) including asymptomatic bacteriuria, cystitis, pyelonephritis, and sepsis. UPEC strains have high genomic diversity and encode numerous virulence factors that differentiate them from non-UTI causing strains, including ompT. As OmpT homologues cleave and inactivate AMPs, we hypothesized that high OmpT protease activity-levels contribute to UPEC colonization during symptomatic UTIs. Therefore, we measured OmpT activity in 58 UPEC clinical isolates. While heterogeneous OmpT activities were observed, OmpT activity was significantly greater in UPEC strains isolated from patients with symptomatic infections. Unexpectedly, UPEC strains exhibiting the greatest protease activities harboured an additional ompT-like gene called arlC (ompTp). The presence of two OmpT-like proteases in some UPEC isolates led us to compare the substrate specificities of OmpT-like proteases found in E. coli. While all three cleaved AMPs, cleavage efficiency varied on the basis of AMP size and secondary structure. Our findings suggest the presence ArlC and OmpT in the same UPEC isolate may confer a fitness advantage by expanding the range of target substrates.


April 21, 2020

Complete genome sequence and evolution analysis of Psychrobacter sp. YP14 from Gammaridea Gastrointestinal Microbiota of Yap Trench

Psychrobacter sp. YP14, a moderately psychrophilic bacterium belonging to the class Gammaproteobacteria, was isolated from Gammaridea Gastrointestinal Microbiota of Yap Trench. The strain has one circular chromosome of 2,895,311 bp with a 44.66% GC content, consisting of 2333 protein-coding genes, 53 tRNA genes and 9 rRNA genes. Four plasmids were completely assembled and their sizes were 13,712 bp, 19711 bp, 36270 bp, 8194 bp, respectively. In particular, a putative open reading frame (ORF) for dienelactone hydrolase (DLH) related to degradation of chlorinated aromatic hydrocarbons. To get an better understanding of the evolution of Psychrobacter sp. YP14 in this genus, six Psychrobacter strains (G, PRwf-1, DAB_AL43B, AntiMn-1,P11G5, P2G3), with publicly available complete genome, were selected and comparative genomics analysis were performed among them. The closest phylogenetic relationship was identified between strains G and K5 based on 16s gene and ANI (average nucleotide identity) values. Analysis of the pan-genome structure found that YP14 has fewer COG clusters associated with transposons and prophage which indicates fewer sequence rearrangements compared with PRwf-1. Besides, stress response-related genes of strain YP14 demonstrates that it has less strategies to cope with extreme environment, which is consistent with its intestinal habitat. The difference of metabolism and strategies coped with stress response of YP14 are more conducive to the study of microbial survival and metabolic mechanisms in deep sea environment.


April 21, 2020

Hi-C guided assemblies reveal conserved regulatory topologies on X and autosomes despite extensive genome shuffling

Genome rearrangements that occur during evolution impose major challenges on regulatory mechanisms that rely on three-dimensional genome architecture. Here, we developed a scaffolding algorithm and generated chromosome-length assemblies from Hi-C data for studying genome topology in three distantly related Drosophila species. We observe extensive genome shuffling between these species with one synteny breakpoint after approximately every six genes. A/B compartments, a set of large gene-dense topologically associating domains (TADs) and spatial contacts between high-affinity sites (HAS) located on the X chromosome are maintained over 40 million years, indicating architectural conservation at various hierarchies. Evolutionary conserved genes cluster in the vicinity of HAS, while HAS locations appear evolutionarily flexible, thus uncoupling functional requirement of dosage compensation from individual positions on the linear X chromosome. Therefore, 3D architecture is preserved even in scenarios of thousands of rearrangements highlighting its relevance for essential processes such as dosage compensation of the X chromosome.


April 21, 2020

Quantifying the Benefit Offered by Transcript Assembly on Single-Molecule Long Reads

Third-generation sequencing technologies benefit transcriptome analysis by generating longer sequencing reads. However, not all single-molecule long reads represent full transcripts due to incomplete cDNA synthesis and the sequencing length limit of the platform. This drives a need for long read transcript assembly. We quantify the benefit that can be achieved by using a transcript assembler on long reads. Adding long-read-specific algorithms, we evolved Scallop to make Scallop-LR, a long-read transcript assembler, to handle the computational challenges arising from long read lengths and high error rates. Analyzing 26 SRA PacBio datasets using Scallop-LR, Iso-Seq Analysis, and StringTie, we quantified the amount by which assembly improved Iso-Seq results. Through combined evaluation methods, we found that Scallop-LR identifies 2100–4000 more (for 18 human datasets) or 1100–2200 more (for eight mouse datasets) known transcripts than Iso-Seq Analysis, which does not do assembly. Further, Scallop-LR finds 2.4–4.4 times more potentially novel isoforms than Iso-Seq Analysis for the human and mouse datasets. StringTie also identifies more transcripts than Iso-Seq Analysis. Adding long-read-specific optimizations in Scallop-LR increases the numbers of predicted known transcripts and potentially novel isoforms for the human transcriptome compared to several recent short-read assemblers (e.g. StringTie). Our findings indicate that transcript assembly by Scallop-LR can reveal a more complete human transcriptome.


April 21, 2020

Multiple Long-read Sequencing Survey of Herpes Simplex Virus Lytic Transcriptome

Long-read sequencing (LRS) has become increasingly important in RNA research due to its strength in resolving complex transcriptomic architectures. In this regard, currently two LRS platforms have demonstrated adequate performance: the Single Molecule Real-Time Sequencing by Pacific Biosciences (PacBio) and the nanopore sequencing by Oxford Nanopore Technologies (ONT). Even though these techniques produce lower coverage and are more error prone than short-read sequencing, they continue to be more successful in identifying transcript isoforms including polycistronic and multi-spliced RNA molecules, as well as transcript overlaps. Recent reports have successfully applied LRS for the investigation of the transcriptome of viruses belonging to various families. These studies have substantially increased the number of previously known viral RNA molecules. In this work, we used the Sequel and MinION technique from PacBio and ONT, respectively, to characterize the lytic transcriptome of the herpes simplex virus type 1 (HSV-1). In most samples, we analyzed the poly(A) fraction of the transcriptome, but we also performed random oligonucleotide-based sequencing. Besides cDNA sequencing, we also carried out native RNA sequencing. Our investigations identified more than 160 previously undetected transcripts, including coding and non-coding RNAs, multi-splice transcripts, as well as polycistronic and complex transcripts. Furthermore, we determined previously unsubstantiated transcriptional start sites, polyadenylation sites, and splice sites. A large number of novel transcriptional overlaps were also detected. Random-primed sequencing revealed that each convergent gene pair produces non-polyadenylated read-through RNAs overlapping the partner genes. Furthermore, we identified novel replication-associated transcripts overlapping the HSV-1 replication origins, and novel LAT variants with very long 5’ regions, which are co-terminal with the LAT-0.7kb transcript. Overall, our results demonstrated that the HSV-1 transcripts form an extremely complex pattern of overlaps, and that entire viral genome is transcriptionally active. In most viral genes, if not in all, both DNA strands are expressed.


April 21, 2020

The Chinese chestnut genome: a reference for species restoration

Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.


April 21, 2020

Characterization of LINE-1 transposons in a human genome at allelic resolution

The activity of the retrotransposon LINE-1 has created a substantial portion of the human genome. Most of this sequence comprises fractured and debilitated LINE-1s. An accurate approximation of the number, location, and sequence of the LINE-1 elements present in any single genome has proven elusive due to the difficulty of assembling and phasing the repetitive and polymorphic regions of the human genome. Through an in-depth analysis of publicly-available, deep, long-read assemblies of nearly homozygous human genomes, we defined the location and sequence of all intact LINE-1s in these assemblies. We found 148 and 142 intact LINE-1s in two nearly homozygous assemblies. A combination of these assemblies suggests a diploid human genome contains at least 50% more intact LINE-1s than previous estimates textendash in this case, 290 intact LINE-1s at 194 loci. We think this is the best approximation, to date, of the number of intact LINE-1s in a single diploid human genome. In addition to counting intact LINE-1 elements, we resolved the sequence of each element, including some LINE-1 elements in unassembled, presumably centromeric regions of the genome. A comparison of the intact LINE-1s in each assembly shows the specific pattern of variation between these genomes, including LINE-1s that remain intact in only one genome, allelic variation in shared intact LINE-1s, and LINE-1s that are unique (presumably young) insertions in only one genome. We found that many old elements (> 6 million years old) remain intact, and comparison of the young and intact LINE-1s across assemblies reinforces the notion that only a small portion of all LINE-1 sequences that may be intact in the genomes of the human population has been uncovered. This dataset provides the first nearly comprehensive estimate of LINE-1 diversity within an individual, an important dataset in the quest to understand the functional consequences of sequence variation in LINE-1 and the complete set of LINE-1s in the human population.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.