Menu
April 21, 2020  |  

5’UTR-mediated regulation of Ataxin-1 expression.

Expression of mutant Ataxin-1 with an abnormally expanded polyglutamine domain is necessary for the onset and progression of spinocerebellar ataxia type 1 (SCA1). Understanding how Ataxin-1 expression is regulated in the human brain could inspire novel molecular therapies for this fatal, dominantly inherited neurodegenerative disease. Previous studies have shown that the ATXN1 3’UTR plays a key role in regulating the Ataxin-1 cellular pool via diverse post-transcriptional mechanisms. Here we show that elements within the ATXN1 5’UTR also participate in the regulation of Ataxin-1 expression. PCR and PacBio sequencing analysis of cDNA obtained from control and SCA1 human brain samples revealed the presence of three major, alternatively spliced ATXN1 5’UTR variants. In cell-based assays, fusion of these variants upstream of an EGFP reporter construct revealed significant and differential impacts on total EGFP protein output, uncovering a type of genetic rheostat-like function of the ATXN1 5’UTR. We identified ribosomal scanning of upstream AUG codons and increased transcript instability as potential mechanisms of regulation. Importantly, transcript-based analyses revealed significant differences in the expression pattern of ATXN1 5’UTR variants between control and SCA1 cerebellum. Together, the data presented here shed light into a previously unknown role for the ATXN1 5’UTR in the regulation of Ataxin-1 and provide new opportunities for the development of SCA1 therapeutics. Copyright © 2019. Published by Elsevier Inc.


April 21, 2020  |  

Single-molecule sequencing detection of N6-methyladenine in microbial reference materials.

The DNA base modification N6-methyladenine (m6A) is involved in many pathways related to the survival of bacteria and their interactions with hosts. Nanopore sequencing offers a new, portable method to detect base modifications. Here, we show that a neural network can improve m6A detection at trained sequence contexts compared to previously published methods using deviations between measured and expected current values as each adenine travels through a pore. The model, implemented as the mCaller software package, can be extended to detect known or confirm suspected methyltransferase target motifs based on predictions of methylation at untrained contexts. We use PacBio, Oxford Nanopore, methylated DNA immunoprecipitation sequencing (MeDIP-seq), and whole-genome bisulfite sequencing data to generate and orthogonally validate methylomes for eight microbial reference species. These well-characterized microbial references can serve as controls in the development and evaluation of future methods for the identification of base modifications from single-molecule sequencing data.


April 21, 2020  |  

Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation.

We describe a method that adds long-read sequencing to a mix of technologies used to assemble a highly complex cattle rumen microbial community, and provide a comparison to short read-based methods. Long-read alignments and Hi-C linkage between contigs support the identification of 188 novel virus-host associations and the determination of phage life cycle states in the rumen microbial community. The long-read assembly also identifies 94 antimicrobial resistance genes, compared to only seven alleles in the short-read assembly. We demonstrate novel techniques that work synergistically to improve characterization of biological features in a highly complex rumen microbial community.


October 23, 2019  |  

Vector design Tour de Force: integrating combinatorial and rational approaches to derive novel adeno-associated virus variants.

Methodologies to improve existing adeno-associated virus (AAV) vectors for gene therapy include either rational approaches or directed evolution to derive capsid variants characterized by superior transduction efficiencies in targeted tissues. Here, we integrated both approaches in one unified design strategy of “virtual family shuffling” to derive a combinatorial capsid library whereby only variable regions on the surface of the capsid are modified. Individual sublibraries were first assembled in order to preselect compatible amino acid residues within restricted surface-exposed regions to minimize the generation of dead-end variants. Subsequently, the successful families were interbred to derive a combined library of ~8?×?10(5) complexity. Next-generation sequencing of the packaged viral DNA revealed capsid surface areas susceptible to directed evolution, thus providing guidance for future designs. We demonstrated the utility of the library by deriving an AAV2-based vector characterized by a 20-fold higher transduction efficiency in murine liver, now equivalent to that of AAV8.


October 23, 2019  |  

Gene targeting by the TAL effector PthXo2 reveals cryptic resistance gene for bacterial blight of rice.

Bacterial blight of rice is caused by the ?-proteobacterium Xanthomonas oryzae pv. oryzae, which utilizes a group of type III TAL (transcription activator-like) effectors to induce host gene expression and condition host susceptibility. Five SWEET genes are functionally redundant to support bacterial disease, but only two were experimentally proven targets of natural TAL effectors. Here, we report the identification of the sucrose transporter gene OsSWEET13 as the disease-susceptibility gene for PthXo2 and the existence of cryptic recessive resistance to PthXo2-dependent X. oryzae pv. oryzae due to promoter variations of OsSWEET13 in japonica rice. PthXo2-containing strains induce OsSWEET13 in indica rice IR24 due to the presence of an unpredicted and undescribed effector binding site not present in the alleles in japonica rice Nipponbare and Kitaake. The specificity of effector-associated gene induction and disease susceptibility is attributable to a single nucleotide polymorphism (SNP), which is also found in a polymorphic allele of OsSWEET13 known as the recessive resistance gene xa25 from the rice cultivar Minghui 63. The mutation of OsSWEET13 with CRISPR/Cas9 technology further corroborates the requirement of OsSWEET13 expression for the state of PthXo2-dependent disease susceptibility to X. oryzae pv. oryzae. Gene profiling of a collection of 104 strains revealed OsSWEET13 induction by 42 isolates of X. oryzae pv. oryzae. Heterologous expression of OsSWEET13 in Nicotiana benthamiana leaf cells elevates sucrose concentrations in the apoplasm. The results corroborate a model whereby X. oryzae pv. oryzae enhances the release of sucrose from host cells in order to exploit the host resources.© 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.


October 23, 2019  |  

Accurate identification and quantification of DNA species by next-generation sequencing in adeno-associated viral vectors produced in insect cells.

Recombinant adeno-associated viral (rAAV) vectors have proven excellent tools for the treatment of many genetic diseases and other complex diseases. However, the illegitimate encapsidation of DNA contaminants within viral particles constitutes a major safety concern for rAAV-based therapies. Moreover, the development of rAAV vectors for early-phase clinical trials has revealed the limited accuracy of the analytical tools used to characterize these new and complex drugs. Although most published data concerning residual DNA in rAAV preparations have been generated by quantitative PCR, we have developed a novel single-strand virus sequencing (SSV-Seq) method for quantification of DNA contaminants in AAV vectors produced in mammalian cells by next-generation sequencing (NGS). Here, we describe the adaptation of SSV-Seq for the accurate identification and quantification of DNA species in rAAV stocks produced in insect cells. We found that baculoviral DNA was the most abundant contaminant, representing less than 2.1% of NGS reads regardless of serotype (2, 8, or rh10). Sf9 producer cell DNA was detected at low frequency (=0.03%) in rAAV lots. Advanced computational analyses revealed that (1) baculoviral sequences close to the inverted terminal repeats preferentially underwent illegitimate encapsidation, and (2) single-nucleotide variants were absent from the rAAV genome. The high-throughput sequencing protocol described here enables effective DNA quality control of rAAV vectors produced in insect cells, and is adapted to conform with regulatory agency safety requirements.


September 22, 2019  |  

The maize W22 genome provides a foundation for functional genomics and transposon biology.

The maize W22 inbred has served as a platform for maize genetics since the mid twentieth century. To streamline maize genome analyses, we have sequenced and de novo assembled a W22 reference genome using short-read sequencing technologies. We show that significant structural heterogeneity exists in comparison to the B73 reference genome at multiple scales, from transposon composition and copy number variation to single-nucleotide polymorphisms. The generation of this reference genome enables accurate placement of thousands of Mutator (Mu) and Dissociation (Ds) transposable element insertions for reverse and forward genetics studies. Annotation of the genome has been achieved using RNA-seq analysis, differential nuclease sensitivity profiling and bisulfite sequencing to map open reading frames, open chromatin sites and DNA methylation profiles, respectively. Collectively, the resources developed here integrate W22 as a community reference genome for functional genomics and provide a foundation for the maize pan-genome.


September 22, 2019  |  

Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation.

Shotgun metagenomics methods enable characterization of microbial communities in human microbiome and environmental samples. Assembly of metagenome sequences does not output whole genomes, so computational binning methods have been developed to cluster sequences into genome ‘bins’. These methods exploit sequence composition, species abundance, or chromosome organization but cannot fully distinguish closely related species and strains. We present a binning method that incorporates bacterial DNA methylation signatures, which are detected using single-molecule real-time sequencing. Our method takes advantage of these endogenous epigenetic barcodes to resolve individual reads and assembled contigs into species- and strain-level bins. We validate our method using synthetic and real microbiome sequences. In addition to genome binning, we show that our method links plasmids and other mobile genetic elements to their host species in a real microbiome sample. Incorporation of DNA methylation information into shotgun metagenomics analyses will complement existing methods to enable more accurate sequence binning.


September 22, 2019  |  

The methylome of the gut microbiome: disparate Dam methylation patterns in intestinal Bacteroides dorei

Despite the large interest in the human microbiome in recent years, there are no reports of bacterial DNA methylation in the microbiome. Here metagenomic sequencing using the Pacific Biosciences platform allowed for rapid identification of bacterial GATC methylation status of a bacterial species in human stool samples. For this work, two stool samples were chosen that were dominated by a single species, Bacteroides dorei. Based on 16S rRNA analysis, this species represented over 45% of the bacteria present in these two samples. The B. dorei genome sequence from these samples was determined and the GATC methylation sites mapped. The Bacteroides dorei genome from one subject lacked any GATC methylation and lacked the DNA adenine methyltransferase genes. In contrast, B. dorei from another subject contained 20,551 methylated GATC sites. Of the 4970 open reading frames identified in the GATC methylated B. dorei genome, 3184 genes were methylated as well as 1735 GATC methylations in intergenic regions. These results suggest that DNA methylation patterns are important to consider in multi-omic analyses of microbiome samples seeking to discover the diversity of bacterial functions and may differ between disease states.


September 22, 2019  |  

PacBio sequencing of gene families – a case study with wheat gluten genes.

Amino acids in wheat (Triticum aestivum) seeds mainly accumulate in storage proteins called gliadins and glutenins. Gliadins contain a/ß-, ?- and ?-types whereas glutenins contain HMW- and LMW-types. Known gliadin and glutenin sequences were largely determined through cloning and sequencing by capillary electrophoresis. This time-consuming process prevents us to intensively study the variation of each orthologous gene copy among cultivars. The throughput and sequencing length of Pacific Bioscience RS (PacBio) single molecule sequencing platform make it feasible to construct contiguous and non-chimeric RNA sequences. We assembled 424 wheat storage protein transcripts from ten wheat cultivars by using just one single-molecule-real-time cell. The protein genes from wheat cultivar Chinese Spring are comparable to known sequences from NCBI. We demonstrated real-time sequencing of gene families with high-throughput and low-cost. This method can be applied to studies of gene amplification and copy number variation among species and cultivars. © 2013 Elsevier B.V. All rights reserved.


September 22, 2019  |  

A comprehensive analysis of alternative splicing in paleopolyploid maize.

Identifying and characterizing alternative splicing (AS) enables our understanding of the biological role of transcript isoform diversity. This study describes the use of publicly available RNA-Seq data to identify and characterize the global diversity of AS isoforms in maize using the inbred lines B73 and Mo17, and a related species, sorghum. Identification and characterization of AS within maize tissues revealed that genes expressed in seed exhibit the largest differential AS relative to other tissues examined. Additionally, differences in AS between the two genotypes B73 and Mo17 are greatest within genes expressed in seed. We demonstrate that changes in the level of alternatively spliced transcripts (intron retention and exon skipping) do not solely reflect differences in total transcript abundance, and we present evidence that intron retention may act to fine-tune gene expression across seed development stages. Furthermore, we have identified temperature sensitive AS in maize and demonstrate that drought-induced changes in AS involve distinct sets of genes in reproductive and vegetative tissues. Examining our identified AS isoforms within B73 × Mo17 recombinant inbred lines (RILs) identified splicing QTL (sQTL). The 43.3% of cis-sQTL regulated junctions are actually identified as alternatively spliced junctions in our analysis, while 10 Mb windows on each side of 48.2% of trans-sQTLs overlap with splicing related genes. Using sorghum as an out-group enabled direct examination of loss or conservation of AS between homeologous genes representing the two subgenomes of maize. We identify several instances where AS isoforms that are conserved between one maize homeolog and its sorghum ortholog are absent from the second maize homeolog, suggesting that these AS isoforms may have been lost after the maize whole genome duplication event. This comprehensive analysis provides new insights into the complexity of AS in maize.


September 22, 2019  |  

Computational analysis of alternative splicing in plant genomes.

Computational analyses play crucial roles in characterizing splicing isoforms in plant genomes. In this review, we provide a survey of computational tools used in recently published, genome-scale splicing analyses in plants. We summarize the commonly used software and pipelines for read mapping, isoform reconstruction, isoform quantification, and differential expression analysis. We also discuss methods for analyzing long reads and the strategies to combine long and short reads in identifying splicing isoforms. We review several tools for characterizing local splicing events, splicing graphs, coding potential, and visualizing splicing isoforms. We further discuss the procedures for identifying conserved splicing isoforms across plant species. Finally, we discuss the outlook of integrating other genomic data with splicing analyses to identify regulatory mechanisms of AS on genome-wide scale. Copyright © 2018 Elsevier B.V. All rights reserved.


September 22, 2019  |  

Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci.

We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development.


September 22, 2019  |  

SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification.

High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.© 2018 Tardaguila et al.; Published by Cold Spring Harbor Laboratory Press.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.