Over the past decade, RNA sequencing (RNA-seq) has become an indispensable tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs. However, as next-generation sequencing technologies have developed, so too has RNA-seq. Now, RNA-seq methods are available for studying many different aspects of RNA biology, including single-cell gene expression, translation (the translatome) and RNA structure (the structurome). Exciting new applications are being explored, such as spatial transcriptomics (spatialomics). Together with new long-read and direct RNA-seq technologies and better computational tools for data analysis, innovations in RNA-seq are contributing to a fuller understanding of RNA biology, from questions such as when and where transcription occurs to the folding and intermolecular interactions that govern RNA function.
Haplotype phasing of genetic variants is important for interpretation of the maize genome, population genetic analysis, and functional genomic analysis of allelic activity. Accordingly, accurate methods for phasing full-length isoforms are essential for functional genomics study. In this study, we performed an isoform-level phasing study in maize, using two inbred lines and their reciprocal crosses, based on single-molecule full-length cDNA sequencing. To phase and analyze full-length transcripts between hybrids and parents, we developed a tool called IsoPhase. Using this tool, we validated the majority of SNPs called against matching short read data and identified cases of allele-specific, gene-level, and isoform-level expression. Our results revealed that maize parental and hybrid lines exhibit different splicing activities. After phasing 6,847 genes in two reciprocal hybrids using embryo, endosperm and root tissues, we annotated the SNPs and identified large-effect genes. In addition, based on single-molecule sequencing, we identified parent-of-origin isoforms in maize hybrids, different novel isoforms between maize parent and hybrid lines, and imprinted genes from different tissues. Finally, we characterized variation in cis- and trans-regulatory effects. Our study provides measures of haplotypic expression that could increase power and accuracy in studies of allelic expression.
Long-read RNA sequencing (RNA-seq) is promising to transcriptomics studies, however, the alignment of the reads is still a fundamental but non-trivial task due to the sequencing errors and complicated gene structures. We propose deSALT, a tailored two-pass long RNA-seq read alignment approach, which constructs graph-based alignment skeletons to sensitively infer exons, and use them to generate spliced reference sequence to produce refined alignments. deSALT addresses several difficult issues, such as small exons, serious sequencing errors and consensus spliced alignment. Benchmarks demonstrate that this approach has a better ability to produce high-quality full-length alignments, which has enormous potentials to transcriptomics studies.
Suppressed recombination allows divergence between homologous sex chromosomes and the functionality of their genes. Here, we reveal patterns of the earliest stages of sex-chromosome evolution in the diploid dioecious herb Mercurialis annua on the basis of cytological analysis, de novo genome assembly and annotation, genetic mapping, exome resequencing of natural populations, and transcriptome analysis. The genome assembly contained 34,105 expressed genes, of which 10,076 were assigned to linkage groups. Genetic mapping and exome resequencing of individuals across the species range both identified the largest linkage group, LG1, as the sex chromosome. Although the sex chromosomes of M. annua are karyotypically homomorphic, we estimate that about a third of the Y chromosome has ceased recombining, containing 568 transcripts and spanning 22.3 cM in the corresponding female map. Nevertheless, we found limited evidence for Y-chromosome degeneration in terms of gene loss and pseudogenization, and most X- and Y-linked genes appear to have diverged in the period subsequent to speciation between M. annua and its sister species M. huetii which shares the same sex-determining region. Taken together, our results suggest that the M. annua Y chromosome has at least two evolutionary strata: a small old stratum shared with M. huetii, and a more recent larger stratum that is probably unique to M. annua and that stopped recombining about one million years ago. Patterns of gene expression within the non-recombining region are consistent with the idea that sexually antagonistic selection may have played a role in favoring suppressed recombination.Copyright © 2019, Genetics.
The ruminants are one of the most successful mammalian lineages, exhibiting morphological and habitat diversity and containing several key livestock species. To better understand their evolution, we generated and analyzed de novo assembled genomes of 44 ruminant species, representing all six Ruminantia families. We used these genomes to create a time-calibrated phylogeny to resolve topological controversies, overcoming the challenges of incomplete lineage sorting. Population dynamic analyses show that population declines commenced between 100,000 and 50,000 years ago, which is concomitant with expansion in human populations. We also reveal genes and regulatory elements that possibly contribute to the evolution of the digestive system, cranial appendages, immune system, metabolism, body size, cursorial locomotion, and dentition of the ruminants. Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Actinosynnema is a small but well-known genus of actinomycetes for production of ansamitocin, the payload component of antibody-drug conjugates against cancers. However, the secondary metabolite production profile of Actinosynnema pretiosum ATCC 31565, the most famous producer of ansamitocin, has never been fully explored. Our antiSMASH analysis of the genomic DNA of Actinosynnema pretiosum ATCC 31565 revealed a NRPS-PKS gene cluster for polyene macrolactam. The gene cluster is very similar to gene clusters for mirilactam and salinilactam, two 26-membered polyene macrolactams from Actinosynnema mirum and Salinispora tropica, respectively. Guided by this bioinformatics prediction, we characterized a novel 26-membered polyene macrolactam from Actinosynnema pretiosum ATCC 31565 and designated it pretilactam. The structure of pretilactam was elucidated by a comprehensive analysis of HRMS, 1D and 2D-NMR, with absolute configuration of chiral carbons predicted bioinformatically. Pretilactam features a dihydroxy tetrahydropyran moiety, and has a hexaene unit and a diene unit as its polyene system. A preliminary antibacterial assay indicated that pretilactam is inactive against Bacillus subtilis and Candida albicans.
Chromosomal-level assembly of the blolsod clam, Scapharca (Anadara) broughtonii, using long sequence reads and Hi-C.
The blood clam, Scapharca (Anadara) broughtonii, is an economically and ecologically important marine bivalve of the family Arcidae. Efforts to study their population genetics, breeding, cultivation, and stock enrichment have been somewhat hindered by the lack of a reference genome. Herein, we report the complete genome sequence of S. broughtonii, a first reference genome of the family Arcidae.A total of 75.79 Gb clean data were generated with the Pacific Biosciences and Oxford Nanopore platforms, which represented approximately 86× coverage of the S. broughtonii genome. De novo assembly of these long reads resulted in an 884.5-Mb genome, with a contig N50 of 1.80 Mb and scaffold N50 of 45.00 Mb. Genome Hi-C scaffolding resulted in 19 chromosomes containing 99.35% of bases in the assembled genome. Genome annotation revealed that nearly half of the genome (46.1%) is composed of repeated sequences, while 24,045 protein-coding genes were predicted and 84.7% of them were annotated.We report here a chromosomal-level assembly of the S. broughtonii genome based on long-read sequencing and Hi-C scaffolding. The genomic data can serve as a reference for the family Arcidae and will provide a valuable resource for the scientific community and aquaculture sector. © The Author(s) 2019. Published by Oxford University Press.
Yellowhorn (Xanthoceras sorbifolium) is a species of the Sapindaceae family native to China and is an oil tree that can withstand cold and drought conditions. A pseudomolecule-level genome assembly for this species will not only contribute to understanding the evolution of its genes and chromosomes but also bring yellowhorn breeding into the genomic era.Here, we generated 15 pseudomolecules of yellowhorn chromosomes, on which 97.04% of scaffolds were anchored, using the combined Illumina HiSeq, Pacific Biosciences Sequel, and Hi-C technologies. The length of the final yellowhorn genome assembly was 504.2 Mb with a contig N50 size of 1.04 Mb and a scaffold N50 size of 32.17 Mb. Genome annotation revealed that 68.67% of the yellowhorn genome was composed of repetitive elements. Gene modelling predicted 24,672 protein-coding genes. By comparing orthologous genes, the divergence time of yellowhorn and its close sister species longan (Dimocarpus longan) was estimated at ~33.07 million years ago. Gene cluster and chromosome synteny analysis demonstrated that the yellowhorn genome shared a conserved genome structure with its ancestor in some chromosomes.This genome assembly represents a high-quality reference genome for yellowhorn. Integrated genome annotations provide a valuable dataset for genetic and molecular research in this species. We did not detect whole-genome duplication in the genome. The yellowhorn genome carries syntenic blocks from ancient chromosomes. These data sources will enable this genome to serve as an initial platform for breeding better yellowhorn cultivars. © The Author(s) 2019. Published by Oxford University Press.
Pecan (Carya illinoinensis) and Chinese hickory (C. cathayensis) are important commercially cultivated nut trees in the genus Carya (Juglandaceae), with high nutritional value and substantial health benefits.We obtained >187.22 and 178.87 gigabases of sequence, and ~288× and 248× genome coverage, to a pecan cultivar (“Pawnee”) and a domesticated Chinese hickory landrace (ZAFU-1), respectively. The total assembly size is 651.31 megabases (Mb) for pecan and 706.43 Mb for Chinese hickory. Two genome duplication events before the divergence from walnut were found in these species. Gene family analysis highlighted key genes in biotic and abiotic tolerance, oil, polyphenols, essential amino acids, and B vitamins. Further analyses of reduced-coverage genome sequences of 16 Carya and 2 Juglans species provide additional phylogenetic perspective on crop wild relatives.Cooperative characterization of these valuable resources provides a window to their evolutionary development and a valuable foundation for future crop improvement. © The Author(s) 2019. Published by Oxford University Press.
Complete chloroplast genome sequences of Kaempferia galanga and Kaempferia elegans: Molecular structures and comparative analysis.
Kaempferia galanga and Kaempferia elegans, which belong to the genus Kaempferia family Zingiberaceae, are used as valuable herbal medicine and ornamental plants, respectively. The chloroplast genomes have been used for molecular markers, species identification and phylogenetic studies. In this study, the complete chloroplast genome sequences of K. galanga and K. elegans are reported. Results show that the complete chloroplast genome of K. galanga is 163,811 bp long, having a quadripartite structure with large single copy (LSC) of 88,405 bp and a small single copy (SSC) of 15,812 bp separated by inverted repeats (IRs) of 29,797 bp. Similarly, the complete chloroplast genome of K. elegans is 163,555 bp long, having a quadripartite structure in which IRs of 29,773 bp length separates 88,020 bp of LSC and 15,989 bp of SSC. A total of 111 genes in K. galanga and 113 genes in K. elegans comprised 79 protein-coding genes and 4 ribosomal RNA (rRNA) genes, as well as 28 and 30 transfer RNA (tRNA) genes in K. galanga and K. elegans, respectively. The gene order, GC content and orientation of the two Kaempferia chloroplast genomes exhibited high similarity. The location and distribution of simple sequence repeats (SSRs) and long repeat sequences were determined. Eight highly variable regions between the two Kaempferia species were identified and 643 mutation events, including 536 single-nucleotide polymorphisms (SNPs) and 107 insertion/deletions (indels), were accurately located. Sequence divergences of the whole chloroplast genomes were calculated among related Zingiberaceae species. The phylogenetic analysis based on SNPs among eleven species strongly supported that K. galanga and K. elegans formed a cluster within Zingiberaceae. This study identified the unique characteristics of the entire K. galanga and K. elegans chloroplast genomes that contribute to our understanding of the chloroplast DNA evolution within Zingiberaceae species. It provides valuable information for phylogenetic analysis and species identification within genus Kaempferia.
Epstein-Barr virus (EBV) is a ubiquitous human pathogen associated with Burkitt’s lymphoma and nasopharyngeal carcinoma. Although the EBV genome harbors more than a hundred genes, a full transcription map with EBV polyadenylation profiles remains unknown. To elucidate the 3′ ends of all EBV transcripts genome-wide, we performed the first comprehensive analysis of viral polyadenylation sites (pA sites) using our previously reported polyadenylation sequencing (PA-seq) technology. We identified that EBV utilizes a total of 62?pA sites in JSC-1, 60 in Raji, and 53 in Akata cells for the expression of EBV genes from both plus and minus DNA strands; 42 of these pA sites are commonly used in all three cell lines. The majority of identified pA sites were mapped to the intergenic regions downstream of previously annotated EBV open reading frames (ORFs) and viral promoters. pA sites lacking an association with any known EBV genes were also identified, mostly for the minus DNA strand within the EBNA locus, a major locus responsible for maintenance of viral latency and cell transformation. The expression of these novel antisense transcripts to EBNA were verified by 3′ rapid amplification of cDNA ends (RACE) and Northern blot analyses in several EBV-positive (EBV+) cell lines. In contrast to EBNA RNA expressed during latency, expression of EBNA-antisense transcripts, which is restricted in latent cells, can be significantly induced by viral lytic infection, suggesting potential regulation of viral gene expression by EBNA-antisense transcription during lytic EBV infection. Our data provide the first evidence that EBV has an unrecognized mechanism that regulates EBV reactivation from latency.IMPORTANCE Epstein-Barr virus represents an important human pathogen with an etiological role in the development of several cancers. By elucidation of a genome-wide polyadenylation landscape of EBV in JSC-1, Raji, and Akata cells, we have redefined the EBV transcriptome and mapped individual polymerase II (Pol II) transcripts of viral genes to each one of the mapped pA sites at single-nucleotide resolution as well as the depth of expression. By unveiling a new class of viral lytic RNA transcripts antisense to latent EBNAs, we provide a novel mechanism of how EBV might control the expression of viral latent genes and lytic infection. Thus, this report takes another step closer to understanding EBV gene structure and expression and paves a new path for antiviral approaches.This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
Genome-Wide Screening for Enteric Colonization Factors in Carbapenem-Resistant ST258 Klebsiella pneumoniae.
A diverse, antibiotic-naive microbiota prevents highly antibiotic-resistant microbes, including carbapenem-resistant Klebsiella pneumoniae (CR-Kp), from achieving dense colonization of the intestinal lumen. Antibiotic-mediated destruction of the microbiota leads to expansion of CR-Kp in the gut, markedly increasing the risk of bacteremia in vulnerable patients. While preventing dense colonization represents a rational approach to reduce intra- and interpatient dissemination of CR-Kp, little is known about pathogen-associated factors that enable dense growth and persistence in the intestinal lumen. To identify genetic factors essential for dense colonization of the gut by CR-Kp, we constructed a highly saturated transposon mutant library with >150,000 unique mutations in an ST258 strain of CR-Kp and screened for in vitro growth and in vivo intestinal colonization in antibiotic-treated mice. Stochastic and partially reversible fluctuations in the representation of different mutations during dense colonization revealed the dynamic nature of intestinal microbial populations. We identified genes that are crucial for early and late stages of dense gut colonization and confirmed their role by testing isogenic mutants in in vivo competition assays with wild-type CR-Kp Screening of the transposon library also identified mutations that enhanced in vivo CR-Kp growth. These newly identified colonization factors may provide novel therapeutic opportunities to reduce intestinal colonization by CR-KpIMPORTANCEKlebsiella pneumoniae is a common cause of bloodstream infections in immunocompromised and hospitalized patients, and over the last 2 decades, some strains have acquired resistance to nearly all available antibiotics, including broad-spectrum carbapenems. The U.S. Centers for Disease Control and Prevention has listed carbapenem-resistant K. pneumoniae (CR-Kp) as an urgent public health threat. Dense colonization of the intestine by CR-Kp and other antibiotic-resistant bacteria is associated with an increased risk of bacteremia. Reducing the density of gut colonization by CR-Kp is likely to reduce their transmission from patient to patient in health care facilities as well as systemic infections. How CR-Kp expands and persists in the gut lumen, however, is poorly understood. Herein, we generated a highly saturated mutant library in a multidrug-resistant K. pneumoniae strain and identified genetic factors that are associated with dense gut colonization by K. pneumoniae This study sheds light on host colonization by K. pneumoniae and identifies potential colonization factors that contribute to high-density persistence of K. pneumoniae in the intestine. Copyright © 2019 Jung et al.
Identification, expression, alternative splicing and functional analysis of pepper WRKY gene family in response to biotic and abiotic stresses.
WRKY proteins are a large group of plant transcription factors that are involved in various biological processes, including biotic and abiotic stress responses, hormone response, plant development, and metabolism. WRKY proteins have been identified in several plants, but only a few have been identified in Capsicum annuum. Here, we identified a total of 62 WRKY genes in the latest pepper genome. These genes were classified into three groups (Groups 1-3) based on the structural features of their proteins. The structures of the encoded proteins, evolution, and expression under normal growth conditions were analyzed and 35 putative miRNA target sites were predicted in 20 CaWRKY genes. Moreover, the response to cold or CMV treatments of selected WRKY genes were examined to validate the roles under stresses. And alternative splicing (AS) events of some CaWRKYs were also identified under CMV infection. Promoter analysis confirmed that CaWRKY genes are involved in growth, development, and biotic or abiotic stress responses in hot pepper. The comprehensive analysis provides fundamental information for better understanding of the signaling pathways involved in the WRKY-mediated regulation of developmental processes, as well as biotic and abiotic stress responses.
Comprehensive transcriptome analysis reveals genes potentially involved in isoflavone biosynthesis in Pueraria thomsonii Benth.
Pueraria thomsonii Benth is an important medicinal plant. Transcriptome sequencing, unigene assembly, the annotation of transcripts and the study of gene expression profiles play vital roles in gene function research. However, the full-length transcriptome of P. thomsonii remains unknown. Here, we obtained 44,339 nonredundant transcripts of P. thomsonii by using the PacBio RS II Isoform and Illumina sequencing platforms, of which 43,195 were annotated genes. Compared with the expression levels in the plant roots, those of transcripts with a |fold change| = 4 and FDR < 0.01 in the leaves or stems were assigned as differentially expressed transcripts (DETs). In total, we found 9,225 DETs, 32 of which came from structural genes that were potentially involved in isoflavone biosynthesis. The expression profiles of 8 structural genes from the RNA-Seq data were validated by qRT-PCR. We identified 437 transcription factors (TFs) that were positively or negatively correlated with at least 1 of the structural genes involved in isoflavone biosynthesis using Pearson correlation coefficients (r) (r > 0.8 or r < -0.8). We also identified a total of 32 microRNAs (miRNAs), which targeted 805 transcripts. These miRNAs caused enriched function in 'ATP binding', 'defense response', 'ADP binding', and 'signal transduction'. Interestingly, MIR156a potentially promoted isoflavone biosynthesis by repressing SBP, and MIR319 promoted isoflavone biosynthesis by repressing TCP and HB-HD-ZIP. Finally, we identified 2,690 alternative splicing events, including that of the structural genes of trans-cinnamate 4-monooxygenase and pullulanase, which are potentially involved in the biosynthesis of isoflavone and starch, respectively, and of three TFs potentially involved in isoflavone biosynthesis. Together, these results provide us with comprehensive insight into the gene expression and regulation of P. thomsonii.
Genomic characterization of Kerstersia gyiorum SWMUKG01, an isolate from a patient with respiratory infection in China.
The Gram-negative bacterium Kerstersia gyiorum, a potential etiological agent of clinical infections, was isolated from several human patients presenting clinical symptoms. Its significance as a possible pathogen has been previously overlooked as no disease has thus far been definitively associated with this bacterium. To better understand how the organism contributes to the infectious disease, we determined the complete genomic sequence of K. gyiorum SWMUKG01, the first clinical isolate from southwest China.The genomic data obtained displayed a single circular chromosome of 3, 945, 801 base pairs in length, which contains 3, 441 protein-coding genes, 55 tRNA genes and 9 rRNA genes. Analysis on the full spectrum of protein coding genes for cellular structures, two-component regulatory systems and iron uptake pathways that may be important for the success of the bacterial survival, colonization and establishment in the host conferred new insights into the virulence characteristics of K. gyiorum. Phylogenomic comparisons with Alcaligenaceae species indicated that K. gyiorum SWMUKG01 had a close evolutionary relationships with Alcaligenes aquatilis and Alcaligenes faecalis.The comprehensive analysis presented in this work determinates for the first time a complete genome sequence of K. gyiorum, which is expected to provide useful information for subsequent studies on pathogenesis of this species.