Skeletal muscle is ideal for passive vaccine administration as it is easily accessible by intramuscular injection. Recombinant adeno-associated virus (rAAV) vectors are in consideration for passive vaccination clinical trials for HIV and influenza. However, greater human skeletal muscle transduction is needed for therapeutic efficacy than is possible with existing serotypes. To bioengineer capsids with therapeutic levels of transduction, we utilized a directed evolution approach to screen libraries of shuffled AAV capsids in pools of surgically resected human skeletal muscle cells from five patients. Six rounds of evolution were performed in various muscle cell types, and evolved variants were validated against existing muscle-tropic serotypes rAAV1, 6, and 8. We found that evolved variants NP22 and NP66 had significantly increased primary human and rhesus skeletal muscle fiber transduction from surgical explants ex vivo and in various primary and immortalized myogenic lines in vitro. Importantly, we demonstrated reduced seroreactivity compared to existing serotypes against normal human serum from 50 adult donors. These capsids represent powerful tools for human skeletal muscle expression and secretion of antibodies from passive vaccines.
Efficient CRISPR/Cas9-mediated editing of trinucleotide repeat expansion in myotonic dystrophy patient-derived iPS and myogenic cells.
CRISPR/Cas9 is an attractive platform to potentially correct dominant genetic diseases by gene editing with unprecedented precision. In the current proof-of-principle study, we explored the use of CRISPR/Cas9 for gene-editing in myotonic dystrophy type-1 (DM1), an autosomal-dominant muscle disorder, by excising the CTG-repeat expansion in the 3′-untranslated-region (UTR) of the human myotonic dystrophy protein kinase (DMPK) gene in DM1 patient-specific induced pluripotent stem cells (DM1-iPSC), DM1-iPSC-derived myogenic cells and DM1 patient-specific myoblasts. To eliminate the pathogenic gain-of-function mutant DMPK transcript, we designed a dual guide RNA based strategy that excises the CTG-repeat expansion with high efficiency, as confirmed by Southern blot and single molecule real-time (SMRT) sequencing. Correction efficiencies up to 90% could be attained in DM1-iPSC as confirmed at the clonal level, following ribonucleoprotein (RNP) transfection of CRISPR/Cas9 components without the need for selective enrichment. Expanded CTG repeat excision resulted in the disappearance of ribonuclear foci, a quintessential cellular phenotype of DM1, in the corrected DM1-iPSC, DM1-iPSC-derived myogenic cells and DM1 myoblasts. Consequently, the normal intracellular localization of the muscleblind-like splicing regulator 1 (MBNL1) was restored, resulting in the normalization of splicing pattern of SERCA1. This study validates the use of CRISPR/Cas9 for gene editing of repeat expansions.
Bioengineered AAV capsids with combined high human liver transduction in vivo and unique humoral seroreactivity.
Existing recombinant adeno-associated virus (rAAV) serotypes for delivering in vivo gene therapy treatments for human liver diseases have not yielded combined high-level human hepatocyte transduction and favorable humoral neutralization properties in diverse patient groups. Yet, these combined properties are important for therapeutic efficacy. To bioengineer capsids that exhibit both unique seroreactivity profiles and functionally transduce human hepatocytes at therapeutically relevant levels, we performed multiplexed sequential directed evolution screens using diverse capsid libraries in both primary human hepatocytes in vivo and with pooled human sera from thousands of patients. AAV libraries were subjected to five rounds of in vivo selection in xenografted mice with human livers to isolate an enriched human-hepatotropic library that was then used as input for a sequential on-bead screen against pooled human immunoglobulins. Evolved variants were vectorized and validated against existing hepatotropic serotypes. Two of the evolved AAV serotypes, NP40 and NP59, exhibited dramatically improved functional human hepatocyte transduction in vivo in xenografted mice with human livers, along with favorable human seroreactivity profiles, compared with existing serotypes. These novel capsids represent enhanced vector delivery systems for future human liver gene therapy applications. Copyright © 2017. Published by Elsevier Inc.
Here, we present the complete genome sequence of a porcine endogenous retrovirus determined by Pacific Biosciences sequencing. A comparison of the genome of this isolate with those of other strains revealed the operation of a mechanism resulting in the selective accumulation of G and C bases in the viral DNA. Copyright © 2017 Szucs et al.
In this study, we sequenced the first full-length insect transcriptome using the Erthesina fullo Thunberg based on the PacBio platform. We constructed the first quantitative transcription map of animal mitochondrial genomes and built a straightforward and concise methodology to investigate mitochondrial gene transcription, RNA processing, mRNA maturation and several other related topics. Most of the results were consistent with the previous studies, while to the best of our knowledge some findings were reported for the first time in this study. The new findings included the high levels of mitochondrial gene expression, the 3′ polyadenylation and possible 5′ m(7)G caps of rRNAs, the isoform diversity of 12S rRNA, the polycistronic transcripts and natural antisense transcripts of mitochondrial genes et al. These findings could challenge and enrich fundamental concepts of mitochondrial gene transcription and RNA processing, particularly of the rRNA primary (sequence) structure. The methodology constructed in this study can also be used to study gene expression or RNA processing of nuclear genomes.
In the last two decades, the field of metagenomics has greatly expanded due to improvement in sequencing technologies allowing for a more comprehensive characterization of microbial communities. The use of these technologies has led to an unprecedented understanding of human, animal, and environmental microbiomes and have shown that the gut microbiota are comparable to an organ that is intrinsically linked with a variety of diseases. Characterization of microbial communities using next-generation sequencing-by-synthesis approaches have revealed important shifts in microbiota associated with debilitating diseases such as Clostridium difficile infection. But due to limitations in sequence read length, primer biases, and the quality of databases, genus- and species-level classification have been difficult. Third-generation technologies, such as Pacific Biosciences’ single molecule, real-time (SMRT) approach, allow for unbiased, more specific identification of species that are likely clinically relevant. Comparison of Illumina next-generation characterization and SMRT sequencing of samples from patients treated for C. difficile infection revealed similarities in community composition at the phylum and family levels, but SMRT sequencing further allowed for species-level characterization – permitting a better understanding of the microbial ecology of this disease. Thus, as sequencing technologies continue to advance, new species-level insights can be gained in the study of complex and clinically-relevant microbial communities.
A comprehensive quality evaluation system for complex herbal medicine using PacBio sequencing, PCR-denaturing gradient gel electrophoresis, and several chemical approaches.
Herbal medicine is a major component of complementary and alternative medicine, contributing significantly to the health of many people and communities. Quality control of herbal medicine is crucial to ensure that it is safe and sound for use. Here, we investigated a comprehensive quality evaluation system for a classic herbal medicine, Danggui Buxue Formula, by applying genetic-based and analytical chemistry approaches to authenticate and evaluate the quality of its samples. For authenticity, we successfully applied two novel technologies, third-generation sequencing and PCR-DGGE (denaturing gradient gel electrophoresis), to analyze the ingredient composition of the tested samples. For quality evaluation, we used high performance liquid chromatography assays to determine the content of chemical markers to help estimate the dosage relationship between its two raw materials, plant roots of Huangqi and Danggui. A series of surveys were then conducted against several exogenous contaminations, aiming to further access the efficacy and safety of the samples. In conclusion, the quality evaluation system demonstrated here can potentially address the authenticity, quality, and safety of herbal medicines, thus providing novel insight for enhancing their overall quality control. Highlight: We established a comprehensive quality evaluation system for herbal medicine, by combining two genetic-based approaches third-generation sequencing and DGGE (denaturing gradient gel electrophoresis) with analytical chemistry approaches to achieve the authentication and quality connotation of the samples.
The utility of genome assemblies does not only rely on the quality of the assembled genome sequence, but also on the quality of the gene annotations. The Pacific Biosciences Iso-Seq technology is a powerful support for accurate eukaryotic gene model annotation as it allows for direct readout of full-length cDNA sequences without the need for noisy short read-based transcript assembly. We propose the implementation of the TeloPrime Full Length cDNA Amplification kit to the Pacific Biosciences Iso-Seq technology in order to enrich for genuine full-length transcripts in the cDNA libraries. We provide evidence that TeloPrime outperforms the commonly used SMARTer PCR cDNA Synthesis Kit in identifying transcription start and end sites in Arabidopsis thaliana. Furthermore, we show that TeloPrime-based Pacific Biosciences Iso-Seq can be successfully applied to the polyploid genome of bread wheat (Triticum aestivum) not only to efficiently annotate gene models, but also to identify novel transcription sites, gene homeologs, splicing isoforms and previously unidentified gene loci.
Korean ginseng (Panax ginseng C.A. Meyer) has been widely used for medicinal purposes and contains potent plant secondary metabolites, including ginsenosides. To obtain transcriptomic data that offers a more comprehensive view of functional genomics in P. ginseng, we generated genome-wide transcriptome data from four different P. ginseng tissues using PacBio isoform sequencing (Iso-Seq) technology. A total of 135,317 assembled transcripts were generated with an average length of 3.2 kb and high assembly completeness. Of those unigenes, 67.5% were predicted to be complete full-length (FL) open reading frames (ORFs) and exhibited a high gene annotation rate. Furthermore, we successfully identified unique full-length genes involved in triterpenoid saponin synthesis and plant hormonal signaling pathways, including auxin and cytokinin. Studies on the functional genomics of P. ginseng seedlings have confirmed the rapid upregulation of negative feed-back loops by auxin and cytokinin signaling cues. The conserved evolutionary mechanisms in the auxin and cytokinin canonical signaling pathways of P. ginseng are more complex than those in Arabidopsis thaliana. Our analysis also revealed a more detailed view of transcriptome-wide alternative isoforms for 88 genes. Finally, transposable elements (TEs) were also identified, suggesting transcriptional activity of TEs in P. ginseng. In conclusion, our results suggest that long-read, full-length or partial-unigene data with high-quality assemblies are invaluable resources as transcriptomic references in P. ginseng and can be used for comparative analyses in closely related medicinal plants.
Characterization of the Rosellinia necatrix transcriptome and genes related to pathogenesis by single-molecule mRNA sequencing.
White root rot disease, caused by the pathogen Rosellinia necatrix, is one of the world’s most devastating plant fungal diseases and affects several commercially important species of fruit trees and crops. Recent global outbreaks of R. necatrix and advances in molecular techniques have both increased interest in this pathogen. However, the lack of information regarding the genomic structure and transcriptome of R. necatrix has been a barrier to the progress of functional genomic research and the control of this harmful pathogen. Here, we identified 10,616 novel full-length transcripts from the filamentous hyphal tissue of R. necatrix (KACC 40445 strain) using PacBio single-molecule sequencing technology. After annotation of the unigene sets, we selected 14 cell cycle-related genes, which are likely either positively or negatively involved in hyphal growth by cell cycle control. The expression of the selected genes was further compared between two strains that displayed different growth rates on nutritional media. Furthermore, we predicted pathogen-related effector genes and cell wall-degrading enzymes from the annotated gene sets. These results provide the most comprehensive transcriptomal resources for R. necatrix, and could facilitate functional genomics and further analyses of this important phytopathogen.
Long-read sequencing technologies enable high-quality, contiguous genome assemblies. Here we used SMRT sequencing to assemble the genome of a Drosophila simulans strain originating from Madagascar, the ancestral range of the species. We generated 8 Gb of raw data (~50x coverage) with a mean read length of 6,410 bp, a NR50 of 9,125 bp and the longest subread at 49 kb. We benchmarked six different assemblers and merged the best two assemblies from Canu and Falcon. Our final assembly was 127.41 Mb with a N50 of 5.38 Mb and 305 contigs. We anchored more than 4 Mb of novel sequence to the major chromosome arms, and significantly improved the assembly of peri-centromeric and telomeric regions. Finally, we performed full-length transcript sequencing and used this data in conjunction with short-read RNAseq data to annotate 13,422 genes in the genome, improving the annotation in regions with complex, nested gene structures.
Alternative splicing (AS) and fusion transcripts produce a vast expansion of transcriptomes and proteomes diversity. However, the reliability of these events and the extend of epigenetic mechanisms have not been adequately addressed due to its limitation of uncertainties about the complete structure of mRNA. Here we combined single-molecule real-time sequencing, Illumina RNA-seq and DNA methylation data to characterize the landscapes of DNA methylation on AS, fusion isoforms formation and lncRNA feature and further to unveil the transcriptome complexity of pig. Our analysis identified an unprecedented scale of high-quality full-length isoforms with over 28,127 novel isoforms from 26,881 novel genes. More than 92,000 novel AS events were detected and intron retention predominated in AS model, followed by exon skipping. Interestingly, we found that DNA methylation played an important role in generating various AS isoforms by regulating splicing sites, promoter regions and first exons. Furthermore, we identified a large of fusion transcripts and novel lncRNAs, and found that DNA methylation of the promoter and gene body could regulate lncRNA expression. Our results significantly improved existed gene models of pig and unveiled that pig AS and epigenetic modify were more complex than previously thought.
Herpes simplex virus type-1 (HSV-1) is a human pathogenic member of the Alphaherpesvirinae subfamily of herpesviruses. The HSV-1 genome is a large double-stranded DNA specifying about 85 protein coding genes. The latest surveys have demonstrated that the HSV-1 transcriptome is much more complex than it had been thought before. Here, we provide a long-read sequencing dataset, which was generated by using the RSII and Sequel systems from Pacific Biosciences (PacBio), as well as MinION sequencing system from Oxford Nanopore Technologies (ONT). This dataset contains 39,096 reads of inserts (ROIs) mapped to the HSV-1 genome (X14112) in RSII sequencing, while Sequel sequencing yielded 77,851 ROIs. The MinION cDNA sequencing altogether resulted in 158,653 reads, while the direct RNA-seq produced 16,516 reads. This dataset can be utilized for the identification of novel HSV RNAs and transcripts isoforms, as well as for the comparison of the quality and length of the sequencing reads derived from the currently available long- read sequencing platforms. The various library preparation approaches can also be compared with each other.
In this study, we used Pacific Biosciences RS II long-read and Illumina HiScanSQ short-read sequencing technologies for the characterization of porcine circovirus type 1 (PCV-1) transcripts. Our aim was to identify novel RNA molecules and transcript isoforms, as well as to determine the exact 5′- and 3′-end sequences of previously described transcripts with single base-pair accuracy. We discovered a novel 3′-UTR length isoform of the Cap transcript, and a non-spliced Cap transcript variant. Additionally, our analysis has revealed a 3′-UTR isoform of Rep and two 5′-UTR isoforms of Rep’ transcripts, and a novel splice variant of the longer Rep’ transcript. We also explored two novel long transcripts, one with a previously identified splice site, and a formerly undetected mRNA of ORF3. Altogether, our methods have identified nine novel RNA molecules, doubling the size of PCV-1 transcriptome that had been known before. Additionally, our investigations revealed an intricate pattern of transcript overlapping, which might produce transcriptional interference between the transcriptional machineries of adjacent genes, and thereby may potentially play a role in the regulation of gene expression in circoviruses. Copyright © 2017 Elsevier B.V. All rights reserved.
Long-read isoform sequencing reveals a hidden complexity of the transcriptional landscape of Herpes Simplex Virus Type 1.
In this study, we used the amplified isoform sequencing technique from Pacific Biosciences to characterize the poly(A)(+) fraction of the lytic transcriptome of the herpes simplex virus type 1 (HSV-1). Our analysis detected 34 formerly unidentified protein-coding genes, 10 non-coding RNAs, as well as 17 polycistronic and complex transcripts. This work also led us to identify many transcript isoforms, including 13 splice and 68 transcript end variants, as well as several transcript overlaps. Additionally, we determined previously unascertained transcriptional start and polyadenylation sites. We analyzed the transcriptional activity from the complementary DNA strand in five convergent HSV gene pairs with quantitative RT-PCR and detected antisense RNAs in each gene. This part of the study revealed an inverse correlation between the expressions of convergent partners. Our work adds new insights for understanding the complexity of the pervasive transcriptional overlaps by suggesting that there is a crosstalk between adjacent and distal genes through interaction between their transcription apparatuses. We also identified transcripts overlapping the HSV replication origins, which may indicate an interplay between the transcription and replication machineries. The relative abundance of HSV-1 transcripts has also been established by using a novel method based on the calculation of sequencing reads for the analysis.