Transcription activator-like effector nucleases (TALENs) have become a powerful tool for genome editing due to the simple code linking the amino acid sequences of their DNA-binding domains to TALEN nucleotide targets. While the initial TALEN-design guidelines are very useful, user-friendly tools defining optimal TALEN designs for robust genome editing need to be developed. Here we evaluated existing guidelines and developed new design guidelines for TALENs based on 205 TALENs tested, and established the scoring algorithm for predicting TALEN activity (SAPTA) as a new online design tool. For any input gene of interest, SAPTA gives a ranked list of potential TALEN target sites, facilitating the selection of optimal TALEN pairs based on predicted activity. SAPTA-based TALEN designs increased the average intracellular TALEN monomer activity by >3-fold, and resulted in an average endogenous gene-modification frequency of 39% for TALENs containing the repeat variable di-residue NK that favors specificity rather than activity. It is expected that SAPTA will become a useful and flexible tool for designing highly active TALENs for genome-editing applications. SAPTA can be accessed via the website at http://baolab.bme.gatech.edu/Research/BioinformaticTools/TAL_targeter.html.
TALENs facilitate targeted genome editing in human cells with high specificity and low cytotoxicity.
Designer nucleases have been successfully employed to modify the genomes of various model organisms and human cell types. While the specificity of zinc-finger nucleases (ZFNs) and RNA-guided endonucleases has been assessed to some extent, little data are available for transcription activator-like effector-based nucleases (TALENs). Here, we have engineered TALEN pairs targeting three human loci (CCR5, AAVS1 and IL2RG) and performed a detailed analysis of their activity, toxicity and specificity. The TALENs showed comparable activity to benchmark ZFNs, with allelic gene disruption frequencies of 15-30% in human cells. Notably, TALEN expression was overall marked by a low cytotoxicity and the absence of cell cycle aberrations. Bioinformatics-based analysis of designer nuclease specificity confirmed partly substantial off-target activity of ZFNs targeting CCR5 and AAVS1 at six known and five novel sites, respectively. In contrast, only marginal off-target cleavage activity was detected at four out of 49 predicted off-target sites for CCR5- and AAVS1-specific TALENs. The rational design of a CCR5-specific TALEN pair decreased off-target activity at the closely related CCR2 locus considerably, consistent with fewer genomic rearrangements between the two loci. In conclusion, our results link nuclease-associated toxicity to off-target cleavage activity and corroborate TALENs as a highly specific platform for future clinical translation. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Well-developed genetic tools for thermophilic microorganisms are scarce, despite their industrial and scientific relevance. Whereas highly efficient CRISPR/Cas9-based genome editing is on the rise in prokaryotes, it has never been employed in a thermophile. Here, we apply Streptococcus pyogenes Cas9 (spCas9)-based genome editing to a moderate thermophile, i.e., Bacillus smithii, including a gene deletion, gene knockout via insertion of premature stop codons, and gene insertion. We show that spCas9 is inactive in vivo above 42 °C, and we employ the wide temperature growth range of B. smithii as an induction system for spCas9 expression. Homologous recombination with plasmid-borne editing templates is performed at 45-55 °C, when spCas9 is inactive. Subsequent transfer to 37 °C allows for counterselection through production of active spCas9, which introduces lethal double-stranded DNA breaks to the nonedited cells. The developed method takes 4 days with 90, 100, and 20% efficiencies for gene deletion, knockout, and insertion, respectively. The major advantage of our system is the limited requirement for genetic parts: only one plasmid, one selectable marker, and a promoter are needed, and the promoter does not need to be inducible or well-characterized. Hence, it can be easily applied for genome editing purposes in both mesophilic and thermophilic nonmodel organisms with a limited genetic toolbox and ability to grow at, or tolerate, temperatures of 37 and at or above 42 °C.
Tal-effector nucleases (TALENs) are engineered proteins that can stimulate precise genome editing through specific DNA double-strand breaks. Sickle cell disease and ß-thalassemia are common genetic disorders caused by mutations in ß-globin, and we engineered a pair of highly active TALENs that induce modification of 54% of human ß-globin alleles near the site of the sickle mutation. These TALENS stimulate targeted integration of therapeutic, full-length beta-globin cDNA to the endogenous ß-globin locus in 19% of cells prior to selection as quantified by single molecule real-time sequencing. We also developed highly active TALENs to human ?-globin, a pharmacologic target in sickle cell disease therapy. Using the ß-globin and ?-globin TALENs, we generated cell lines that express GFP under the control of the endogenous ß-globin promoter and tdTomato under the control of the endogenous ?-globin promoter. With these fluorescent reporter cell lines, we screened a library of small molecule compounds for their differential effect on the transcriptional activity of the endogenous ß- and ?-globin genes and identified several that preferentially upregulate ?-globin expression.
It has recently become possible to rapidly and accurately detect epigenetic signatures in bacterial genomes using third generation sequencing data. Monitoring the speed at which a single polymerase inserts a base in the read strand enables one to infer whether a modification is present at that specific site on the template strand. These sites can be challenging to detect in the absence of high coverage and reliable reference genomes.Here we provide a new method for detecting epigenetic motifs in bacteria on datasets with low-coverage, with incomplete references, and with mixed samples (i.e. metagenomic data). Our approach treats motif inference as a kmer comparison problem. First, genomes (or contigs) are deconstructed into kmers. Then, native genome-wide distributions of interpulse durations (IPDs) for kmers are compared with corresponding whole genome amplified (WGA, modification free) IPD distributions using log likelihood ratios. Finally, kmers are ranked and greedily selected by iteratively correcting for sequences within a particular kmer’s neighborhood.Our method can detect multiple types of modifications, even at very low-coverage and in the presence of mixed genomes. Additionally, we are able to predict modified motifs when genomes with “neighbor” modified motifs exist within the sample. Lastly, we show that these motifs can provide an alternative source of information by which to cluster metagenomics contigs and that iterative refinement on these clustered contigs can further improve both sensitivity and specificity of motif detection.https://github.com/alibashir/EMMCKmer.
Searching for convergent pathways in autism spectrum disorders: insights from human brain transcriptome studies.
Autism spectrum disorder (ASD) is one of the most heritable neuropsychiatric conditions. The complex genetic landscape of the disorder includes both common and rare variants at hundreds of genetic loci. This marked heterogeneity has thus far hampered efforts to develop genetic diagnostic panels and targeted pharmacological therapies. Here, we give an overview of the current literature on the genetic basis of ASD, and review recent human brain transcriptome studies and their role in identifying convergent pathways downstream of the heterogeneous genetic variants. We also discuss emerging evidence on the involvement of non-coding genomic regions and non-coding RNAs in ASD.
PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II’s sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.
Shotgun metagenomics methods enable characterization of microbial communities in human microbiome and environmental samples. Assembly of metagenome sequences does not output whole genomes, so computational binning methods have been developed to cluster sequences into genome ‘bins’. These methods exploit sequence composition, species abundance, or chromosome organization but cannot fully distinguish closely related species and strains. We present a binning method that incorporates bacterial DNA methylation signatures, which are detected using single-molecule real-time sequencing. Our method takes advantage of these endogenous epigenetic barcodes to resolve individual reads and assembled contigs into species- and strain-level bins. We validate our method using synthetic and real microbiome sequences. In addition to genome binning, we show that our method links plasmids and other mobile genetic elements to their host species in a real microbiome sample. Incorporation of DNA methylation information into shotgun metagenomics analyses will complement existing methods to enable more accurate sequence binning.
The methylome of the gut microbiome: disparate Dam methylation patterns in intestinal Bacteroides dorei
Despite the large interest in the human microbiome in recent years, there are no reports of bacterial DNA methylation in the microbiome. Here metagenomic sequencing using the Pacific Biosciences platform allowed for rapid identification of bacterial GATC methylation status of a bacterial species in human stool samples. For this work, two stool samples were chosen that were dominated by a single species, Bacteroides dorei. Based on 16S rRNA analysis, this species represented over 45% of the bacteria present in these two samples. The B. dorei genome sequence from these samples was determined and the GATC methylation sites mapped. The Bacteroides dorei genome from one subject lacked any GATC methylation and lacked the DNA adenine methyltransferase genes. In contrast, B. dorei from another subject contained 20,551 methylated GATC sites. Of the 4970 open reading frames identified in the GATC methylated B. dorei genome, 3184 genes were methylated as well as 1735 GATC methylations in intergenic regions. These results suggest that DNA methylation patterns are important to consider in multi-omic analyses of microbiome samples seeking to discover the diversity of bacterial functions and may differ between disease states.
Molecular genetic diversity and characterization of conjugation genes in the fish parasite Ichthyophthirius multifiliis.
Ichthyophthirius multifiliis is the etiologic agent of “white spot”, a commercially important disease of freshwater fish. As a parasitic ciliate, I. multifiliis infects numerous host species across a broad geographic range. Although Ichthyophthirius outbreaks are difficult to control, recent sequencing of the I. multifiliis genome has revealed a number of potential metabolic pathways for therapeutic intervention, along with likely vaccine targets for disease prevention. Nonetheless, major gaps exist in our understanding of both the life cycle and population structure of I. multifiliis in the wild. For example, conjugation has never been described in this species, and it is unclear whether I. multifiliis undergoes sexual reproduction, despite the presence of a germline micronucleus. In addition, no good methods exist to distinguish strains, leaving phylogenetic relationships between geographic isolates completely unresolved. Here, we compared nucleotide sequences of SSUrDNA, mitochondrial NADH dehydrogenase subunit I and cox-1 genes, and 14 somatic SNP sites from nine I. multifiliis isolates obtained from four different states in the US since 1995. The mitochondrial sequences effectively distinguished the isolates from one another and divided them into at least two genetically distinct groups. Furthermore, none of the nine isolates shared the same composition of the 14 somatic SNP sites, suggesting that I. multifiliis undergoes sexual reproduction at some point in its life cycle. Finally, compared to the well-studied free-living ciliates Tetrahymena thermophila and Paramecium tetraurelia, I. multifiliis has lost 38% and 29%, respectively, of 16 experimentally confirmed conjugation-related genes, indicating that mechanistic differences in sexual reproduction are likely to exist between I. multifiliis and other ciliate species. Copyright © 2015 Elsevier Inc. All rights reserved.
Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered for the first time. A second revolution came when next-generation sequencing (NGS) technologies appeared, which made genome sequencing much cheaper and faster. However, NGS methods have several drawbacks and pitfalls, most notably their short reads. Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality. Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. This marks the third revolution in sequencing technology. Here we review and compare the various long-read methods. We discuss their applications and their respective strengths and weaknesses and provide future perspectives. Copyright © 2018 Elsevier Ltd. All rights reserved.
Viral infection perturbs host cells and can be used to uncover regulatory mechanisms controlling cellular responses and susceptibility to infections. Using cell biological, biochemical, and genetic tools, we reveal that influenza A virus (IAV) infection induces global transcriptional defects at the 3′ ends of active host genes and RNA polymerase II (RNAPII) run-through into extragenic regions. Deregulated RNAPII leads to expression of aberrant RNAs (3′ extensions and host-gene fusions) that ultimately cause global transcriptional downregulation of physiological transcripts, an effect influencing antiviral response and virulence. This phenomenon occurs with multiple strains of IAV, is dependent on influenza NS1 protein, and can be modulated by SUMOylation of an intrinsically disordered region (IDR) of NS1 expressed by the 1918 pandemic IAV strain. Our data identify a strategy used by IAV to suppress host gene expression and indicate that polymorphisms in IDRs of viral proteins can affect the outcome of an infection.
Analysis of RNA base modification and structural rearrangement by single-molecule real-time detection of reverse transcription.
Zero-mode waveguides (ZMWs) are photonic nanostructures that create highly confined optical observation volumes, thereby allowing single-molecule-resolved biophysical studies at relatively high concentrations of fluorescent molecules. This principle has been successfully applied in single-molecule, real-time (SMRT®) DNA sequencing for the detection of DNA sequences and DNA base modifications. In contrast, RNA sequencing methods cannot provide sequence and RNA base modifications concurrently as they rely on complementary DNA (cDNA) synthesis by reverse transcription followed by sequencing of cDNA. Thus, information on RNA modifications is lost during the process of cDNA synthesis.Here we describe an application of SMRT technology to follow the activity of reverse transcriptase enzymes synthesizing cDNA on thousands of single RNA templates simultaneously in real time with single nucleotide turnover resolution using arrays of ZMWs. This method thereby obtains information from the RNA template directly. The analysis of the kinetics of the reverse transcriptase can be used to identify RNA base modifications, shown by example for N6-methyladenine (m6A) in oligonucleotides and in a specific mRNA extracted from total cellular mRNA. Furthermore, the real-time reverse transcriptase dynamics informs about RNA secondary structure and its rearrangements, as demonstrated on a ribosomal RNA and an mRNA template.Our results highlight the feasibility of studying RNA modifications and RNA structural rearrangements in ZMWs in real time. In addition, they suggest that technology can be developed for direct RNA sequencing provided that the reverse transcriptase is optimized to resolve homonucleotide stretches in RNA.
Identifying and characterizing alternative splicing (AS) enables our understanding of the biological role of transcript isoform diversity. This study describes the use of publicly available RNA-Seq data to identify and characterize the global diversity of AS isoforms in maize using the inbred lines B73 and Mo17, and a related species, sorghum. Identification and characterization of AS within maize tissues revealed that genes expressed in seed exhibit the largest differential AS relative to other tissues examined. Additionally, differences in AS between the two genotypes B73 and Mo17 are greatest within genes expressed in seed. We demonstrate that changes in the level of alternatively spliced transcripts (intron retention and exon skipping) do not solely reflect differences in total transcript abundance, and we present evidence that intron retention may act to fine-tune gene expression across seed development stages. Furthermore, we have identified temperature sensitive AS in maize and demonstrate that drought-induced changes in AS involve distinct sets of genes in reproductive and vegetative tissues. Examining our identified AS isoforms within B73 × Mo17 recombinant inbred lines (RILs) identified splicing QTL (sQTL). The 43.3% of cis-sQTL regulated junctions are actually identified as alternatively spliced junctions in our analysis, while 10 Mb windows on each side of 48.2% of trans-sQTLs overlap with splicing related genes. Using sorghum as an out-group enabled direct examination of loss or conservation of AS between homeologous genes representing the two subgenomes of maize. We identify several instances where AS isoforms that are conserved between one maize homeolog and its sorghum ortholog are absent from the second maize homeolog, suggesting that these AS isoforms may have been lost after the maize whole genome duplication event. This comprehensive analysis provides new insights into the complexity of AS in maize.
Transgenerational attenuation of opioid self-administration as a consequence of adolescent morphine exposure.
The United States is in the midst of an opiate epidemic, with abuse of prescription and illegal opioids increasing steadily over the past decade. While it is clear that there is a genetic component to opioid addiction, there is a significant portion of heritability that cannot be explained by genetics alone. The current study was designed to test the hypothesis that maternal exposure to opioids prior to pregnancy alters abuse liability in subsequent generations. Female adolescent Sprague Dawley rats were administered morphine at increasing doses (5-25 mg/kg, s.c.) or saline for 10 days (P30-39). During adulthood, animals were bred with drug-naïve colony males. Male and female adult offspring (F1 animals) were tested for morphine self-administration acquisition, progressive ratio, extinction, and reinstatement at three doses of morphine (0.25, 0.75, 1.25 mg/kg/infusion). Grand offspring (F2 animals, from the maternal line) were also examined. Additionally, gene expression changes within the nucleus accumbens were examined with RNA deep sequencing (PacBio) and qPCR. There were dose- and sex-dependent effects on all phases of the self-administration paradigm that indicate decreased morphine reinforcement and attenuated relapse-like behavior. Additionally, genes related to synaptic plasticity, as well as myelin basic protein (MBP), were dysregulated. Some, but not all, effects persisted into the subsequent (F2) generation. The results demonstrate that even limited opioid exposure during adolescence can have lasting effects across multiple generations, which has implications for mechanisms of the transmission of drug abuse liability in humans. Copyright © 2016 Elsevier Ltd. All rights reserved.