April 21, 2020  |  

De novo assembly of a wild pear (Pyrus betuleafolia) genome.

China is the origin and evolutionary centre of Oriental pears. Pyrus betuleafolia is a wild species native to China and distributed in the northern region, and it is widely used as rootstock. Here, we report the de novo assembly of the genome of P. betuleafolia-Shanxi Duli using an integrated strategy that combines PacBio sequencing, BioNano mapping and chromosome conformation capture (Hi-C) sequencing. The genome assembly size was 532.7 Mb, with a contig N50 of 1.57 Mb. A total of 59 552 protein-coding genes and 247.4 Mb of repetitive sequences were annotated for this genome. The expansion genes in P. betuleafolia were significantly enriched in secondary metabolism, which may account for the organism’s considerable environmental adaptability. An alignment analysis of orthologous genes showed that fruit size, sugar metabolism and transport, and photosynthetic efficiency were positively selected in Oriental pear during domestication. A total of 573 nucleotide-binding site (NBS)-type resistance gene analogues (RGAs) were identified in the P. betuleafolia genome, 150 of which are TIR-NBS-LRR (TNL)-type genes, which represented the greatest number of TNL-type genes among the published Rosaceae genomes and explained the strong disease resistance of this wild species. The study of flavour metabolism-related genes showed that the anthocyanidin reductase (ANR) metabolic pathway affected the astringency of pear fruit and that sorbitol transporter (SOT) transmembrane transport may be the main factor affecting the accumulation of soluble organic matter. This high-quality P. betuleafolia genome provides a valuable resource for the utilization of wild pear in fundamental pear studies and breeding. © 2019 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.


April 21, 2020  |  

The evaluation of RNA-Seq de novo assembly by PacBio long read sequencing

RNA-Seq de novo assembly is an important method to generate transcriptomes for non-model organisms before any downstream analysis. Given many great de novo assembly methods developed by now, one critical issue is that there is no consensus on the evaluation of de novo assembly methods yet. Therefore, to set up a benchmark for evaluating the quality of de novo assemblies is very critical. Addressing this challenge will help us deepen the insights on the properties of different de novo assemblers and their evaluation methods, and provide hints on choosing the best assembly sets as transcriptomes of non-model organisms for the further functional analysis. In this article, we generate a textquotedblleftreal timetextquotedblright transcriptome using PacBio long reads as a benchmark for evaluating five de novo assemblers and two model-based de novo assembly evaluation methods. By comparing the de novo assmblies generated by RNA-Seq short reads with the textquotedblleftreal timetextquotedblright transcriptome from the same biological sample, we find that Trinity is best at the completeness by generating more assemblies than the alternative assemblers, but less continuous and having more misassemblies; Oases is best at the continuity and specificity, but less complete; The performance of SOAPdenovo-Trans, Trans-AByss and IDBA-Tran are in between of five assemblers. For evaluation methods, DETONATE leverages multiple aspects of the assembly set and ranks the assembly set with an average performance as the best, meanwhile the contig score can serve as a good metric to select assemblies with high completeness, specificity, continuity but not sensitive to misassemblies; TransRate contig score is useful for removing misassemblies, yet often the assemblies in the optimal set is too few to be used as a transcriptome.


April 21, 2020  |  

An improved pig reference genome sequence to enable pig genetics and genomics research

The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model with high anatomical and immunological similarity to humans. The draft reference genome (Sscrofa10.2) represented a purebred female pig from a commercial pork production breed (Duroc), and was established using older clone-based sequencing methods. The Sscrofa10.2 assembly was incomplete and unresolved redundancies, short range order and orientation errors and associated misassembled genes limited its utility. We present two highly contiguous chromosome-level genome assemblies created with more recent long read technologies and a whole genome shotgun strategy, one for the same Duroc female (Sscrofa11.1) and one for an outbred, composite breed male animal commonly used for commercial pork production (USMARCv1.0). Both assemblies are of substantially higher (>90-fold) continuity and accuracy compared to the earlier reference, and the availability of two independent assemblies provided an opportunity to identify large-scale variants and to error-check the accuracy of representation of the genome. We propose that the improved Duroc breed assembly (Sscrofa11.1) become the reference genome for genomic research in pigs.


April 21, 2020  |  

Transcriptional initiation of a small RNA, not R-loop stability, dictates the frequency of pilin antigenic variation in Neisseria gonorrhoeae.

Neisseria gonorrhoeae, the sole causative agent of gonorrhea, constitutively undergoes diversification of the Type IV pilus. Gene conversion occurs between one of the several donor silent copies located in distinct loci and the recipient pilE gene, encoding the major pilin subunit of the pilus. A guanine quadruplex (G4) DNA structure and a cis-acting sRNA (G4-sRNA) are located upstream of the pilE gene and both are required for pilin antigenic variation (Av). We show that the reduced sRNA transcription lowers pilin Av frequencies. Extended transcriptional elongation is not required for Av, since limiting the transcript to 32 nt allows for normal Av frequencies. Using chromatin immunoprecipitation (ChIP) assays, we show that cellular G4s are less abundant when sRNA transcription is lower. In addition, using ChIP, we demonstrate that the G4-sRNA forms a stable RNA:DNA hybrid (R-loop) with its template strand. However, modulating R-loop levels by controlling RNase HI expression does not alter G4 abundance quantified through ChIP. Since pilin Av frequencies were not altered when modulating R-loop levels by controlling RNase HI expression, we conclude that transcription of the sRNA is necessary, but stable R-loops are not required to promote pilin Av. © 2019 John Wiley & Sons Ltd.


April 21, 2020  |  

Genome assembly provides insights into the genome evolution and flowering regulation of orchardgrass.

Orchardgrass (Dactylis glomerata L.) is an important forage grass for cultivating livestock worldwide. Here, we report an ~1.84-Gb chromosome-scale diploid genome assembly of orchardgrass, with a contig N50 of 0.93 Mb, a scaffold N50 of 6.08 Mb and a super-scaffold N50 of 252.52 Mb, which is the first chromosome-scale assembled genome of a cool-season forage grass. The genome includes 40 088 protein-coding genes, and 69% of the assembled sequences are transposable elements, with long terminal repeats (LTRs) being the most abundant. The LTRretrotransposons may have been activated and expanded in the grass genome in response to environmental changes during the Pleistocene between 0 and 1 million years ago. Phylogenetic analysis reveals that orchardgrass diverged after rice but before three Triticeae species, and evolutionarily conserved chromosomes were detected by analysing ancient chromosome rearrangements in these grass species. We also resequenced the whole genome of 76 orchardgrass accessions and found that germplasm from Northern Europe and East Asia clustered together, likely due to the exchange of plants along the ‘Silk Road’ or other ancient trade routes connecting the East and West. Last, a combined transcriptome, quantitative genetic and bulk segregant analysis provided insights into the genetic network regulating flowering time in orchardgrass and revealed four main candidate genes controlling this trait. This chromosome-scale genome and the online database of orchardgrass developed here will facilitate the discovery of genes controlling agronomically important traits, stimulate genetic improvement of and functional genetic research on orchardgrass and provide comparative genetic resources for other forage grasses. © 2019 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.


April 21, 2020  |  

Insights into transcriptional characteristics and homoeolog expression bias of embryo and de-embryonated kernels in developing grain through RNA-Seq and Iso-Seq.

Bread wheat (Triticum aestivum L.) is an allohexaploid, and the transcriptional characteristics of the wheat embryo and endosperm during grain development remain unclear. To analyze the transcriptome, we performed isoform sequencing (Iso-Seq) for wheat grain and RNA sequencing (RNA-Seq) for the embryo and de-embryonated kernels. The differential regulation between the embryo and de-embryonated kernels was found to be greater than the difference between the two time points for each tissue. Exactly 2264 and 4790 tissue-specific genes were found at 14 days post-anthesis (DPA), while 5166 and 3784 genes were found at 25 DPA in the embryo and de-embryonated kernels, respectively. Genes expressed in the embryo were more likely to be related to nucleic acid and enzyme regulation. In de-embryonated kernels, genes were rich in substance metabolism and enzyme activity functions. Moreover, 4351, 4641, 4516, and 4453 genes with the A, B, and D homoeoloci were detected for each of the four tissues. Expression characteristics suggested that the D genome may be the largest contributor to the transcriptome in developing grain. Among these, 48, 66, and 38 silenced genes emerged in the A, B, and D genomes, respectively. Gene ontology analysis showed that silenced genes could be inclined to different functions in different genomes. Our study provided specific gene pools of the embryo and de-embryonated kernels and a homoeolog expression bias model on a large scale. This is helpful for providing new insights into the molecular physiology of wheat.


April 21, 2020  |  

RNA sequencing: the teenage years.

Over the past decade, RNA sequencing (RNA-seq) has become an indispensable tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs. However, as next-generation sequencing technologies have developed, so too has RNA-seq. Now, RNA-seq methods are available for studying many different aspects of RNA biology, including single-cell gene expression, translation (the translatome) and RNA structure (the structurome). Exciting new applications are being explored, such as spatial transcriptomics (spatialomics). Together with new long-read and direct RNA-seq technologies and better computational tools for data analysis, innovations in RNA-seq are contributing to a fuller understanding of RNA biology, from questions such as when and where transcription occurs to the folding and intermolecular interactions that govern RNA function.


April 21, 2020  |  

Integrative functional genomics decodes herpes simplex virus 1

Since the genome of herpes simplex virus 1 (HSV-1) was first sequenced more than 30 years ago, its predicted 80 genes have been intensively studied. Here, we unravel the complete viral transcriptome and translatome during lytic infection with base-pair resolution by computational integration of multi-omics data. We identified a total of 201 viral transcripts and 284 open reading frames (ORFs) including all known and 46 novel large ORFs. Multiple transcript isoforms expressed from individual gene loci explain translation of the vast majority of novel viral ORFs as well as N-terminal extensions (NTEs) and truncations thereof. We show that key viral regulators and structural proteins possess NTEs, which initiate from non-canonical start codons and govern subcellular protein localization and packaging. We validated a novel non-canonical large spliced ORF in the ICP0 locus and identified a 93 aa ORF overlapping ICP34.5 that is thus also deleted in the FDA-approved oncolytic virus Imlygic. Finally, we extend the current nomenclature to include all novel viral gene products. Taken together, this work provides a valuable resource for future functional studies, vaccine design and oncolytic therapies.


April 21, 2020  |  

Hemimetabolous insects elucidate the origin of sexual development via alternative splicing

Insects are the only animals in which sexual differentiation is controlled by sex-specific RNA splicing. The doublesex (dsx) transcription factor produces distinct male and female protein isoforms (DsxM and DsxF) under the control of the RNA splicing factor transformer (tra). tra itself is also alternatively spliced so that a functional Tra protein is only present in females; thus, DsxM is produced by default, while DsxF expression requires Tra. The sex-specific Dsx isoforms are essential for both male and female sexual differentiation. This pathway is profoundly different from the molecular mechanisms that control sex-specific development in other animal groups. In animals as different as vertebrates, nematodes, and crustaceans, sexual differentiation involves male-specific transcription of dsx-related transcription factors that are not alternatively spliced and play no role in female sexual development. To understand how the unique splicing-based mode of sexual differentiation found in insects evolved from a more ancestral transcription-based mechanism, we examined dsx and tra expression in three basal, hemimetabolous insect orders. We find that functional Tra protein is limited to females in the kissing bug Rhodnius prolixus (Hemiptera), but is present in both sexes in the louse Pediculus humanus (Phthiraptera) and the cockroach Blattella germanica (Blattodea). Although alternatively spliced dsx isoforms are seen in all these insects, they are sex-specific in the cockroach and the kissing bug but not in the louse. In B. germanica, RNAi experiments show that dsx is necessary for male, but not female, sexual differentiation, while tra controls female development via a dsx-independent pathway. Our results suggest that the distinctive insect mechanism based on the tra-dsx splicing cascade evolved in a gradual, mosaic process: sex-specific splicing of dsx predates its role in female sexual differentiation, while the role of tra in regulating dsx splicing and in sexual development more generally predates sex-specific expression of the Tra protein. We present a model where the canonical tra-dsx axis originated via merger between expanding dsx function (from males to both sexes) and narrowing tra function (from a general splicing factor to the dedicated regulator of dsx).


April 21, 2020  |  

Quantifying the Benefit Offered by Transcript Assembly on Single-Molecule Long Reads

Third-generation sequencing technologies benefit transcriptome analysis by generating longer sequencing reads. However, not all single-molecule long reads represent full transcripts due to incomplete cDNA synthesis and the sequencing length limit of the platform. This drives a need for long read transcript assembly. We quantify the benefit that can be achieved by using a transcript assembler on long reads. Adding long-read-specific algorithms, we evolved Scallop to make Scallop-LR, a long-read transcript assembler, to handle the computational challenges arising from long read lengths and high error rates. Analyzing 26 SRA PacBio datasets using Scallop-LR, Iso-Seq Analysis, and StringTie, we quantified the amount by which assembly improved Iso-Seq results. Through combined evaluation methods, we found that Scallop-LR identifies 2100–4000 more (for 18 human datasets) or 1100–2200 more (for eight mouse datasets) known transcripts than Iso-Seq Analysis, which does not do assembly. Further, Scallop-LR finds 2.4–4.4 times more potentially novel isoforms than Iso-Seq Analysis for the human and mouse datasets. StringTie also identifies more transcripts than Iso-Seq Analysis. Adding long-read-specific optimizations in Scallop-LR increases the numbers of predicted known transcripts and potentially novel isoforms for the human transcriptome compared to several recent short-read assemblers (e.g. StringTie). Our findings indicate that transcript assembly by Scallop-LR can reveal a more complete human transcriptome.


April 21, 2020  |  

Multiple Long-read Sequencing Survey of Herpes Simplex Virus Lytic Transcriptome

Long-read sequencing (LRS) has become increasingly important in RNA research due to its strength in resolving complex transcriptomic architectures. In this regard, currently two LRS platforms have demonstrated adequate performance: the Single Molecule Real-Time Sequencing by Pacific Biosciences (PacBio) and the nanopore sequencing by Oxford Nanopore Technologies (ONT). Even though these techniques produce lower coverage and are more error prone than short-read sequencing, they continue to be more successful in identifying transcript isoforms including polycistronic and multi-spliced RNA molecules, as well as transcript overlaps. Recent reports have successfully applied LRS for the investigation of the transcriptome of viruses belonging to various families. These studies have substantially increased the number of previously known viral RNA molecules. In this work, we used the Sequel and MinION technique from PacBio and ONT, respectively, to characterize the lytic transcriptome of the herpes simplex virus type 1 (HSV-1). In most samples, we analyzed the poly(A) fraction of the transcriptome, but we also performed random oligonucleotide-based sequencing. Besides cDNA sequencing, we also carried out native RNA sequencing. Our investigations identified more than 160 previously undetected transcripts, including coding and non-coding RNAs, multi-splice transcripts, as well as polycistronic and complex transcripts. Furthermore, we determined previously unsubstantiated transcriptional start sites, polyadenylation sites, and splice sites. A large number of novel transcriptional overlaps were also detected. Random-primed sequencing revealed that each convergent gene pair produces non-polyadenylated read-through RNAs overlapping the partner genes. Furthermore, we identified novel replication-associated transcripts overlapping the HSV-1 replication origins, and novel LAT variants with very long 5’ regions, which are co-terminal with the LAT-0.7kb transcript. Overall, our results demonstrated that the HSV-1 transcripts form an extremely complex pattern of overlaps, and that entire viral genome is transcriptionally active. In most viral genes, if not in all, both DNA strands are expressed.


April 21, 2020  |  

Disruption of the kringle 1 domain of prothrombin leads to late onset mortality in zebrafish

The ability to prevent blood loss in response to injury is a critical, evolutionarily conserved function of all vertebrates. Prothrombin (F2) contributes to both primary and secondary hemostasis through the activation of platelets and the conversion of soluble fibrinogen to insoluble fibrin, respectively. Complete prothrombin deficiency has never been observed in humans and is incompatible with life in mice, limiting the ability to understand the entirety of prothrombin’s in vivo functions. We have previously demonstrated the ability of zebrafish to tolerate loss of both pro- and anticoagulant factors that are embryonic lethal in mammals, making them an ideal model for the study of prothrombin deficiency. Using genome editing with TALENs, we have generated a null allele in zebrafish f2. Homozygous mutant embryos develop normally into early adulthood, but demonstrate eventual complete mortality with the majority of fish succumbing to internal hemorrhage by 2 months of age. We show that despite the extended survival, the mutants are unable to form occlusive thrombi in both the venous and arterial systems as early as 3-5 days of life, and we were able to phenocopy this early hemostatic defect using direct oral anticoagulants. When the equivalent mutation was engineered into the homologous residues of human prothrombin, there were severe reductions in secretion and activation, suggesting a possible role for kringle 1 in thrombin maturation, and the possibility that the F1.2 fragment has a functional role in exerting the procoagulant effects of thrombin. Together, our data demonstrate the conserved function of thrombin in zebrafish, as well as the requirement for kringle 1 for biosynthesis and activation by prothrombinase. Understanding how zebrafish are able to develop normally and survive into early adulthood without prothrombin will provide important insight into its pleiotropic functions as well as the management of patients with bleeding disorders.


April 21, 2020  |  

Variant Phasing and Haplotypic Expression from Single-molecule Long-read Sequencing in Maize

Haplotype phasing of genetic variants is important for interpretation of the maize genome, population genetic analysis, and functional genomic analysis of allelic activity. Accordingly, accurate methods for phasing full-length isoforms are essential for functional genomics study. In this study, we performed an isoform-level phasing study in maize, using two inbred lines and their reciprocal crosses, based on single-molecule full-length cDNA sequencing. To phase and analyze full-length transcripts between hybrids and parents, we developed a tool called IsoPhase. Using this tool, we validated the majority of SNPs called against matching short read data and identified cases of allele-specific, gene-level, and isoform-level expression. Our results revealed that maize parental and hybrid lines exhibit different splicing activities. After phasing 6,847 genes in two reciprocal hybrids using embryo, endosperm and root tissues, we annotated the SNPs and identified large-effect genes. In addition, based on single-molecule sequencing, we identified parent-of-origin isoforms in maize hybrids, different novel isoforms between maize parent and hybrid lines, and imprinted genes from different tissues. Finally, we characterized variation in cis- and trans-regulatory effects. Our study provides measures of haplotypic expression that could increase power and accuracy in studies of allelic expression.


April 21, 2020  |  

ORF Capture-Seq: a versatile method for targeted identification of full-length isoforms

Most human protein-coding genes are expressed as multiple isoforms. This in turn greatly expands the functional repertoire of the encoded proteome. While at least one reliable open reading frame (ORF) model has been assigned for every gene, the majority of alternative isoforms remains uncharacterized experimentally. This is primarily due to: i) vast differences of overall levels between different isoforms expressed from common genes, and ii) the difficulty of obtaining contiguous full-length ORF sequences. Here, we present ORF Capture-Seq (OCS), a flexible and cost-effective method that addresses both challenges for targeted full-length isoform sequencing applications using collections of cloned ORFs as probes. As proof-of-concept, we show that an OCS pipeline focused on genes coding for transcription factors increases isoform detection by an order of magnitude, compared to unenriched sample. In short, OCS enables rapid discovery of isoforms from custom-selected genes and will allow mapping of the full set of human isoforms at reasonable cost.


April 21, 2020  |  

Full-length mRNA sequencing and gene expression profiling reveal broad involvement of natural antisense transcript gene pairs in pepper development and response to stresses.

Pepper is an important vegetable with great economic value and unique biological features. In the past few years, significant development has been made towards understanding the huge complex pepper genome; however, pepper functional genomics has not been well studied. To better understand the pepper gene structure and pepper gene regulation, we conducted full-length mRNA sequencing by PacBio sequencing and obtained 57862 high-quality full-length mRNA sequences derived from 18362 previously annotated and 5769 newly detected genes. New gene models were built that combined the full-length mRNA sequences and corrected approximately 500 fragmented gene models from previous annotations. Based on the full-length mRNA, we identified 4114 and 5880 pepper genes forming natural antisense transcript (NAT) genes in-cis and in-trans, respectively. Most of these genes accumulate small RNAs in their overlapping regions. By analyzing these NAT gene expression patterns in our transcriptome data, we identified many NAT pairs responsive to a variety of biological processes in pepper. Pepper formate dehydrogenase 1 (FDH1), which is required for R-gene-mediated disease resistance, may be regulated by nat-siRNAs and participate in a positive feedback loop in salicylic acid biosynthesis during resistance responses. Several cis-NAT pairs and subgroups of trans-NAT genes were responsive to pepper pericarp and placenta development, which may play roles in capsanthin and capsaicin biosynthesis. Using a comparative genomics approach, the evolutionary mechanisms of cis-NATs were investigated, and we found that an increase in intergenic sequences accounted for the loss of most cis-NATs, while transposon insertion contributed to the formation of most new cis-NATs. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.