Menu
April 21, 2020

Immunogenetic factors driving formation of ultralong VH CDR3 in Bos taurus antibodies.

The antibody repertoire of Bos taurus is characterized by a subset of variable heavy (VH) chain regions with ultralong third complementarity determining regions (CDR3) which, compared to other species, can provide a potent response to challenging antigens like HIV env. These unusual CDR3 can range to over seventy highly diverse amino acids in length and form unique ß-ribbon ‘stalk’ and disulfide bonded ‘knob’ structures, far from the typical antigen binding site. The genetic components and processes for forming these unusual cattle antibody VH CDR3 are not well understood. Here we analyze sequences of Bos taurus antibody VH domains and find that the subset with ultralong CDR3 exclusively uses a single variable gene, IGHV1-7 (VHBUL) rearranged to the longest diversity gene, IGHD8-2. An eight nucleotide duplication at the 3′ end of IGHV1-7 encodes a longer V-region producing an extended F ß-strand that contributes to the stalk in a rearranged CDR3. A low amino acid variability was observed in CDR1 and CDR2, suggesting that antigen binding for this subset most likely only depends on the CDR3. Importantly a novel, potentially AID mediated, deletional diversification mechanism of the B. taurus VH ultralong CDR3 knob was discovered, in which interior codons of the IGHD8-2 region are removed while maintaining integral structural components of the knob and descending strand of the stalk in place. These deletions serve to further diversify cysteine positions, and thus disulfide bonded loops. Hence, both germline and somatic genetic factors and processes appear to be involved in diversification of this structurally unusual cattle VH ultralong CDR3 repertoire.


April 21, 2020

Whole-Genome Alignment and Comparative Annotation.

Rapidly improving sequencing technology coupled with computational developments in sequence assembly are making reference-quality genome assembly economical. Hundreds of vertebrate genome assemblies are now publicly available, and projects are being proposed to sequence thousands of additional species in the next few years. Such dense sampling of the tree of life should give an unprecedented new understanding of evolution and allow a detailed determination of the events that led to the wealth of biodiversity around us. To gain this knowledge, these new genomes must be compared through genome alignment (at the sequence level) and comparative annotation (at the gene level). However, different alignment and annotation methods have different characteristics; before starting a comparative genomics analysis, it is important to understand the nature of, and biases and limitations inherent in, the chosen methods. This review is intended to act as a technical but high-level overview of the field that should provide this understanding. We briefly survey the state of the genome alignment and comparative annotation fields and potential future directions for these fields in a new, large-scale era of comparative genomics.


April 21, 2020

Genome-wide Transcript Structure Resolution Reveals Abundant Alternate Isoform Usage from Murine Gammaherpesvirus 68.

The gammaherpesviruses, including Epstein-Barr virus (EBV), Kaposi’s sarcoma-associated herpesvirus (KSHV), and murine gammaherpesvirus 68 (MHV68, MuHV-4, ?HV68), are etiologic agents of a wide range of lymphomas and non-hematological malignancies. These viruses possess large and highly dense dsDNA genomes that feature >80 bidirectionally positioned open reading frames (ORFs). The abundance of overlapping transcripts and extensive splicing throughout these genomes have until now prohibited high throughput-based resolution of transcript structures. Here, we integrate the capabilities of long-read sequencing with the accuracy of short-read platforms to globally resolve MHV68 transcript structures using the transcript resolution through integration of multi-platform data (TRIMD) pipeline. This approach reveals highly complex features, including: (1) pervasive overlapping transcript structures; (2) transcripts containing intra-gene or trans-gene splices that yield chimeric ORFs; (3) antisense and intergenic transcripts containing ORFs; and (4) noncoding transcripts. This work sheds light on the underappreciated complexity of gammaherpesvirus transcription and provides an extensively revised annotation of the MHV68 transcriptome. Copyright © 2019 The Authors. Published by Elsevier Inc. All rights reserved.


April 21, 2020

The red bayberry genome and genetic basis of sex determination.

Morella rubra, red bayberry, is an economically important fruit tree in south China. Here, we assembled the first high-quality genome for both a female and a male individual of red bayberry. The genome size was 313-Mb, and 90% sequences were assembled into eight pseudo chromosome molecules, with 32 493 predicted genes. By whole-genome comparison between the female and male and association analysis with sequences of bulked and individual DNA samples from female and male, a 59-Kb region determining female was identified and located on distal end of pseudochromosome 8, which contains abundant transposable element and seven putative genes, four of them are related to sex floral development. This 59-Kb female-specific region was likely to be derived from duplication and rearrangement of paralogous genes and retained non-recombinant in the female-specific region. Sex-specific molecular markers developed from candidate genes co-segregated with sex in a genetically diverse female and male germplasm. We propose sex determination follow the ZW model of female heterogamety. The genome sequence of red bayberry provides a valuable resource for plant sex chromosome evolution and also provides important insights for molecular biology, genetics and modern breeding in Myricaceae family. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.


April 21, 2020

Genome-wide analysis of methyl jasmonate-regulated isoform expression in the medicinal plant Andrographis paniculata

Alternative splicing can increase the complexity of the transcriptome and proteome. The most common mechanism of alternative splicing in plants is intron retention (IR), and the expression levels of IR isoforms can be differentially regulated when facing abiotic stress. The full-length transcriptome of the medicinal plant Andrographis paniculata was sequenced using both Illumina- and SMRT-based RNA-seq and a total of 4846 IR isoforms were identified. The expression levels of 310/296 IR isoforms were up-regulated, and 629/659 IR isoforms were down-regulated at 24?h/48?h after methyl jasmonate (MeJA) treatment, respectively. In the (E,E,E)-geranylgeranyl diphosphate (GGPP) biosynthesis pathway which contributes to the andrographolide biosynthesis, eight genes were alternatively spliced, resulting in a total of 25 isoforms, of which 12 are IR isoforms. After MeJA treatment, four of these IR isoforms showed significant differential expression. RT-PCR and qRT-PCR experiments confirmed the existence of five IR isoforms. This research deepens our understanding of the A. paniculata transcriptome and can assist in the future study of andrographolide biosynthesis.


April 21, 2020

Genome assembly and gene expression in the American black bear provides new insights into the renal response to hibernation.

The prevalence of chronic kidney disease (CKD) is rising worldwide and 10-15% of the global population currently suffers from CKD and its complications. Given the increasing prevalence of CKD there is an urgent need to find novel treatment options. The American black bear (Ursus americanus) copes with months of lowered kidney function and metabolism during hibernation without the devastating effects on metabolism and other consequences observed in humans. In a biomimetic approach to better understand kidney adaptations and physiology in hibernating black bears, we established a high-quality genome assembly. Subsequent RNA-Seq analysis of kidneys comparing gene expression profiles in black bears entering (late fall) and emerging (early spring) from hibernation identified 169 protein-coding genes that were differentially expressed. Of these, 101 genes were downregulated and 68 genes were upregulated after hibernation. Fold changes ranged from 1.8-fold downregulation (RTN4RL2) to 2.4-fold upregulation (CISH). Most notable was the upregulation of cytokine suppression genes (SOCS2, CISH, and SERPINC1) and the lack of increased expression of cytokines and genes involved in inflammation. The identification of these differences in gene expression in the black bear kidney may provide new insights in the prevention and treatment of CKD. © The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


April 21, 2020

Genome-Scale Sequence Disruption Following Biolistic Transformation in Rice and Maize.

Biolistic transformation delivers nucleic acids into plant cells by bombarding the cells with microprojectiles, which are micron-scale, typically gold particles. Despite the wide use of this technique, little is known about its effect on the cell’s genome. We biolistically transformed linear 48-kb phage lambda and two different circular plasmids into rice (Oryza sativa) and maize (Zea mays) and analyzed the results by whole genome sequencing and optical mapping. Although some transgenic events showed simple insertions, others showed extreme genome damage in the form of chromosome truncations, large deletions, partial trisomy, and evidence of chromothripsis and breakage-fusion bridge cycling. Several transgenic events contained megabase-scale arrays of introduced DNA mixed with genomic fragments assembled by nonhomologous or microhomology-mediated joining. Damaged regions of the genome, assayed by the presence of small fragments displaced elsewhere, were often repaired without a trace, presumably by homology-dependent repair (HDR). The results suggest a model whereby successful biolistic transformation relies on a combination of end joining to insert foreign DNA and HDR to repair collateral damage caused by the microprojectiles. The differing levels of genome damage observed among transgenic events may reflect the stage of the cell cycle and the availability of templates for HDR. © 2019 American Society of Plant Biologists. All rights reserved.


April 21, 2020

The CF Canada-Sick Kids Program in individual CF therapy: A resource for the advancement of personalized medicine in CF.

Therapies targeting certain CFTR mutants have been approved, yet variations in clinical response highlight the need for in-vitro and genetic tools that predict patient-specific clinical outcomes. Toward this goal, the CF Canada-Sick Kids Program in Individual CF Therapy (CFIT) is generating a “first of its kind”, comprehensive resource containing patient-specific cell cultures and data from 100 CF individuals that will enable modeling of therapeutic responses.The CFIT program is generating: 1) nasal cells from drug naïve patients suitable for culture and the study of drug responses in vitro, 2) matched gene expression data obtained by sequencing the RNA from the primary nasal tissue, 3) whole genome sequencing of blood derived DNA from each of the 100 participants, 4) induced pluripotent stem cells (iPSCs) generated from each participant’s blood sample, 5) CRISPR-edited isogenic control iPSC lines and 6) prospective clinical data from patients treated with CF modulators.To date, we have recruited 57 of 100 individuals to CFIT, most of whom are homozygous for F508del (to assess in-vitro: in-vivo correlations with respect to ORKAMBI response) or heterozygous for F508del and a minimal function mutation. In addition, several donors are homozygous for rare nonsense and missense mutations. Nasal epithelial cell cultures and matched iPSC lines are available for many of these donors.This accessible resource will enable development of tools that predict individual outcomes to current and emerging modulators targeting F508del-CFTR and facilitate therapy discovery for rare CF causing mutations.Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.


April 21, 2020

Full-length transcriptome analysis of Litopenaeus vannamei reveals transcript variants involved in the innate immune system.

To better understand the immune system of shrimp, this study combined PacBio isoform sequencing (Iso-Seq) and Illumina paired-end short reads sequencing methods to discover full-length immune-related molecules of the Pacific white shrimp, Litopenaeus vannamei. A total of 72,648 nonredundant full-length transcripts (unigenes) were generated with an average length of 2545 bp from five main tissues, including the hepatopancreas, cardiac stomach, heart, muscle, and pyloric stomach. These unigenes exhibited a high annotation rate (62,164, 85.57%) when compared against NR, NT, Swiss-Prot, Pfam, GO, KEGG and COG databases. A total of 7544 putative long noncoding RNAs (lncRNAs) were detected and 1164 nonredundant full-length transcripts (449 UniTransModels) participated in the alternative splicing (AS) events. Importantly, a total of 5279 nonredundant full-length unigenes were successfully identified, which were involved in the innate immune system, including 9 immune-related processes, 19 immune-related pathways and 10 other immune-related systems. We also found wide transcript variants, which increased the number and function complexity of immune molecules; for example, toll-like receptors (TLRs) and interferon regulatory factors (IRFs). The 480 differentially expressed genes (DEGs) were significantly higher or tissue-specific expression patterns in the hepatopancreas compared with that in other four tested tissues (FDR <0.05). Furthermore, the expression levels of six selected immune-related DEGs and putative IRFs were validated using real-time PCR technology, substantiating the reliability of the PacBio Iso-seq results. In conclusion, our results provide new genetic resources of long-read full-length transcripts data and information for identifying immune-related genes, which are an invaluable transcriptomic resource as genomic reference, especially for further exploration of the innate immune and defense mechanisms of shrimp. Copyright © 2019 Elsevier Ltd. All rights reserved.


April 21, 2020

TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts.

Long-read, single-molecule sequencing platforms hold great potential for isoform discovery and characterization of multi-exon transcripts. However, their high error rates are an obstacle to distinguishing novel transcript isoforms from sequencing artifacts. Therefore, we developed the package TranscriptClean to correct mismatches, microindels and noncanonical splice junctions in mapped transcripts using the reference genome while preserving known variants.Our method corrects nearly all mismatches and indels present in a publically available human PacBio Iso-seq dataset, and rescues 39% of noncanonical splice junctions.All Python and R scripts used in this paper are available at https://github.com/dewyman/TranscriptClean.


April 21, 2020

TaF: a web platform for taxonomic profile-based fungal gene prediction.

The accurate prediction and annotation of gene structures from the genome sequence of an organism enable genome-wide functional analyses to obtain insight into the biological properties of an organism.We recently developed a highly accurate filamentous fungal gene prediction pipeline and web platform called TaF. TaF is a homology-based gene predictor employing large-scale taxonomic profiling to search for close relatives in genome queries.TaF pipeline consists of four processing steps; (1) taxonomic profiling to search for close relatives to query, (2) generation of hints for determining exon-intron boundaries from orthologous protein sequence data of the profiled species, (3) gene prediction by combination of ab inito and evidence-based prediction methods, and (4) homology search for gene models.TaF generates extrinsic evidence that suggests possible exon-intron boundaries based on orthologous protein sequence data, thus reducing false-positive predictions of gene structure based on distantly related orthologs data. In particular, the gene prediction method using taxonomic profiling shows very high accuracy, including high sensitivity and specificity for gene models, suggesting a new approach for homology-based gene prediction from newly sequenced or uncharacterized fungal genomes, with the potential to improve the quality of gene prediction.TaF will be a useful tool for fungal genome-wide analyses, including the identification of targeted genes associated with a trait, transcriptome profiling, comparative genomics, and evolutionary analysis.


April 21, 2020

Combined Genome and Transcriptome (G&T) Sequencing of Single Cells.

The simultaneous examination of a single cell’s genome and transcriptome presents scientists with a powerful tool to study genetic variability and its effect on gene expression. In this chapter, we describe the library generation method for combined genome and transcriptome sequencing (G&T-seq) originally described by Macaulay et al. (Nat Protoc 11(11):2081-2103, 2016; Nat Methods 12(6):519-522, 2015). This includes some alterations we made to improve robustness of this process for both the novice user and laboratories that want to deploy this method at scale. Using this method, genomic DNA and full-length mRNA from single cells are separated, amplified, and converted into Illumina sequencer-compatible sequencing libraries.


April 21, 2020

The interplay between microRNA and alternative splicing of linear and circular RNAs in eleven plant species.

MicroRNA (miRNA) and alternative splicing (AS)-mediated post-transcriptional regulation has been extensively studied in most eukaryotes. However, the interplay between AS and miRNAs has not been explored in plants. To our knowledge, the overall profile of miRNA target sites in circular RNAs (circRNA) generated by alternative back splicing has never been reported previously. To address the challenge, we identified miRNA target sites located in alternatively spliced regions of the linear and circular splice isoforms using the up-to-date single-molecule real-time (SMRT) isoform sequencing (Iso-Seq) and Illumina sequencing data in eleven plant species.In total, we identified 399?401 and 114?574 AS events from linear and circular RNAs, respectively. Among them, there were 64?781 and 41?146 miRNA target sites located in linear and circular AS region, respectively. In addition, we found 38?913 circRNAs to be overlapping with 45?648 AS events of its own parent isoforms, suggesting circRNA regulation of AS of linear RNAs by forming R-loop with the genomic locus. Here, we present a comprehensive database of miRNA targets in alternatively spliced linear and circRNAs (ASmiR) and a web server for deposition and identification of miRNA target sites located in the alternatively spliced region of linear and circular RNAs. This database is accompanied by an easy-to-use web query interface for meaningful downstream analysis. Plant research community can submit user-defined datasets to the web service to search AS regions harboring small RNA target sites. In conclusion, this study provides an unprecedented resource to understand regulatory relationships between miRNAs and AS in both gymnosperms and angiosperms.The readily accessible database and web-based tools are available at http://forestry.fafu.edu.cn/bioinfor/db/ASmiR.Supplementary data are available at Bioinformatics online. © The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.


April 21, 2020

Hybrid sequencing-based personal full-length transcriptomic analysis implicates proteostatic stress in metastatic ovarian cancer.

Comprehensive molecular characterization of myriad somatic alterations and aberrant gene expressions at personal level is key to precision cancer therapy, yet limited by current short-read sequencing technology, individualized catalog of complete genomic and transcriptomic features is thus far elusive. Here, we integrated second- and third-generation sequencing platforms to generate a multidimensional dataset on a patient affected by metastatic epithelial ovarian cancer. Whole-genome and hybrid transcriptome dissection captured global genetic and transcriptional variants at previously unparalleled resolution. Particularly, single-molecule mRNA sequencing identified a vast array of unannotated transcripts, novel long noncoding RNAs and gene chimeras, permitting accurate determination of transcription start, splice, polyadenylation and fusion sites. Phylogenetic and enrichment inference of isoform-level measurements implicated early functional divergence and cytosolic proteostatic stress in shaping ovarian tumorigenesis. A complementary imaging-based high-throughput drug screen was performed and subsequently validated, which consistently pinpointed proteasome inhibitors as an effective therapeutic regime by inducing protein aggregates in ovarian cancer cells. Therefore, our study suggests that clinical application of the emerging long-read full-length analysis for improving molecular diagnostics is feasible and informative. An in-depth understanding of the tumor transcriptome complexity allowed by leveraging the hybrid sequencing approach lays the basis to reveal novel and valid therapeutic vulnerabilities in advanced ovarian malignancies.


April 21, 2020

PacBio full-length cDNA sequencing integrated with RNA-seq reads drastically improves the discovery of splicing transcripts in rice.

In eukaryotes, alternative splicing (AS) greatly expands the diversity of transcripts. However, it is challenging to accurately determine full-length splicing isoforms. Recently, more studies have taken advantage of Pacific Bioscience (PacBio) long-read sequencing to identify full-length transcripts. Nevertheless, the high error rate of PacBio reads seriously offsets the advantages of long reads, especially for accurately identifying splicing junctions. To best capitalize on the features of long reads, we used Illumina RNA-seq reads to improve PacBio circular consensus sequence (CCS) quality and to validate splicing patterns in the rice transcriptome. We evaluated the impact of CCS accuracy on the number and the validation rate of splicing isoforms, and integrated a comprehensive pipeline of splicing transcripts analysis by Iso-Seq and RNA-seq (STAIR) to identify the full-length multi-exon isoforms in rice seedling transcriptome (Oryza sativa L. ssp. japonica). STAIR discovered 11 733 full-length multi-exon isoforms, 6599 more than the SMRT Portal RS_IsoSeq pipeline did. Of these splicing isoforms identified, 4453 (37.9%) were missed in assembled transcripts from RNA-seq reads, and 5204 (44.4%), including 268 multi-exon long non-coding RNAs (lncRNAs), were not reported in the MSU_osa1r7 annotation. Some randomly selected unreported splicing junctions were verified by polymerase chain reaction (PCR) amplification. In addition, we investigated alternative polyadenylation (APA) events in transcripts and identified 829 major polyadenylation [poly(A)] site clusters (PACs). The analysis of splicing isoforms and APA events will facilitate the annotation of the rice genome and studies on the expression and polyadenylation of AS genes in different developmental stages or growth conditions of rice. © 2018 The Authors The Plant Journal © 2018 John Wiley & Sons Ltd.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.