Richard Kuo from the Roslin Institute gave this PAG 2017 talk about using the PacBio Iso-Seq data to generate genome annotations that outperform current gold-standard annotations. Included: findings from a…
In this ASHG 2016 virtual poster, Flora Tassone from UC Davis describes her study of the molecular mechanisms linked to fragile X syndrome and associated disorders, such as FXTAS. She…
This tutorial provides an introduction to SMRT Analysis within SMRT Link. The training includes an overview of the various PacBio analysis applications and an introduction on their use. This tutorial…
The Iso-Seq method enables the sequencing of transcript isoforms from the 5’ end to their poly-A tails, eliminating the need for transcript reconstruction and inference. This webinar provides a comprehensive…
This tutorial provides an overview of the Isoform Sequence (Iso-Seq) analysis application. The Iso-Seq application provides reads that span entire transcript isoforms, from the 5′ end to the 3′ poly…
A comparison of immunoglobulin IGHV, IGHD and IGHJ genes in wild-derived and classical inbred mouse strains.
The genomes of classical inbred mouse strains include genes derived from all three major subspecies of the house mouse, Mus musculus. We recently posited that genetic diversity in the immunoglobulin heavy chain (IGH) gene loci of C57BL/6 and BALB/c mice reflect differences in subspecies origin. To investigate this hypothesis, we conducted high-throughput sequencing of IGH gene rearrangements to document IGH variable (IGHV), joining (IGHJ), and diversity (IGHD) genes in four inbred wild-derived mouse strains (CAST/EiJ, LEWES/EiJ, MSM/MsJ, and PWD/PhJ), and a single disease model strain (NOD/ShiLtJ), collectively representing genetic backgrounds of several major mouse subspecies. A total of 341 germline IGHV sequences were inferred in the wild-derived strains, including 247 not curated in the International Immunogenetics Information System. In contrast, 83/84 inferred NOD IGHV genes had previously been observed in C57BL/6 mice. Variability among the strains examined was observed for only a single IGHJ gene, involving a description of a novel allele. In contrast, unexpected variation was found in the IGHD gene loci, with four previously unreported IGHD gene sequences being documented. Very few IGHV sequences of C57BL/6 and BALB/c mice were shared with strains representing major subspecies, suggesting that their IGH loci may be complex mosaics of genes of disparate origins. This suggests a similar level of diversity is likely present in the IGH loci of other classical inbred strains. This must now be documented if we are to properly understand inter-strain variation in models of antibody-mediated disease. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.
Cultured Epidermal Autografts from Clinically Revertant Skin as a Potential Wound Treatment for Recessive Dystrophic Epidermolysis Bullosa.
Inherited skin disorders have been reported recently to have sporadic normal-looking areas, where a portion of the keratinocytes have recovered from causative gene mutations (revertant mosaicism). We observed a case of recessive dystrophic epidermolysis bullosa treated with cultured epidermal autografts (CEAs), whose CEA-grafted site remained epithelized for 16 years. We proved that the CEA product and the grafted area included cells with revertant mosaicism. Based on these findings, we conducted an investigator-initiated clinical trial of CEAs from clinically revertant skin for recessive dystrophic epidermolysis bullosa. The donor sites were analyzed by genetic analysis, immunofluorescence, electron microscopy, and quantification of the reverted mRNA with deep sequencing. The primary endpoint was the ulcer epithelization rate per patient at 4 weeks after the last CEA application. Three patients with recessive dystrophic epidermolysis bullosa with 8 ulcers were enrolled, and the epithelization rate for each patient at the primary endpoint was 87.7%, 100%, and 57.0%, respectively. The clinical effects were found to persist for at least 76 weeks after CEA transplantation. One of the three patients had apparent revertant mosaicism in the donor skin and in the post-transplanted area. CEAs from clinically normal skin are a potentially well-tolerated treatment for recessive dystrophic epidermolysis bullosa.Copyright © 2019 The Authors. Published by Elsevier Inc. All rights reserved.
Transcriptional initiation of a small RNA, not R-loop stability, dictates the frequency of pilin antigenic variation in Neisseria gonorrhoeae.
Neisseria gonorrhoeae, the sole causative agent of gonorrhea, constitutively undergoes diversification of the Type IV pilus. Gene conversion occurs between one of the several donor silent copies located in distinct loci and the recipient pilE gene, encoding the major pilin subunit of the pilus. A guanine quadruplex (G4) DNA structure and a cis-acting sRNA (G4-sRNA) are located upstream of the pilE gene and both are required for pilin antigenic variation (Av). We show that the reduced sRNA transcription lowers pilin Av frequencies. Extended transcriptional elongation is not required for Av, since limiting the transcript to 32 nt allows for normal Av frequencies. Using chromatin immunoprecipitation (ChIP) assays, we show that cellular G4s are less abundant when sRNA transcription is lower. In addition, using ChIP, we demonstrate that the G4-sRNA forms a stable RNA:DNA hybrid (R-loop) with its template strand. However, modulating R-loop levels by controlling RNase HI expression does not alter G4 abundance quantified through ChIP. Since pilin Av frequencies were not altered when modulating R-loop levels by controlling RNase HI expression, we conclude that transcription of the sRNA is necessary, but stable R-loops are not required to promote pilin Av. © 2019 John Wiley & Sons Ltd.
The ability to prevent blood loss in response to injury is a critical, evolutionarily conserved function of all vertebrates. Prothrombin (F2) contributes to both primary and secondary hemostasis through the activation of platelets and the conversion of soluble fibrinogen to insoluble fibrin, respectively. Complete prothrombin deficiency has never been observed in humans and is incompatible with life in mice, limiting the ability to understand the entirety of prothrombin’s in vivo functions. We have previously demonstrated the ability of zebrafish to tolerate loss of both pro- and anticoagulant factors that are embryonic lethal in mammals, making them an ideal model for the study of prothrombin deficiency. Using genome editing with TALENs, we have generated a null allele in zebrafish f2. Homozygous mutant embryos develop normally into early adulthood, but demonstrate eventual complete mortality with the majority of fish succumbing to internal hemorrhage by 2 months of age. We show that despite the extended survival, the mutants are unable to form occlusive thrombi in both the venous and arterial systems as early as 3-5 days of life, and we were able to phenocopy this early hemostatic defect using direct oral anticoagulants. When the equivalent mutation was engineered into the homologous residues of human prothrombin, there were severe reductions in secretion and activation, suggesting a possible role for kringle 1 in thrombin maturation, and the possibility that the F1.2 fragment has a functional role in exerting the procoagulant effects of thrombin. Together, our data demonstrate the conserved function of thrombin in zebrafish, as well as the requirement for kringle 1 for biosynthesis and activation by prothrombinase. Understanding how zebrafish are able to develop normally and survive into early adulthood without prothrombin will provide important insight into its pleiotropic functions as well as the management of patients with bleeding disorders.
Most human protein-coding genes are expressed as multiple isoforms. This in turn greatly expands the functional repertoire of the encoded proteome. While at least one reliable open reading frame (ORF) model has been assigned for every gene, the majority of alternative isoforms remains uncharacterized experimentally. This is primarily due to: i) vast differences of overall levels between different isoforms expressed from common genes, and ii) the difficulty of obtaining contiguous full-length ORF sequences. Here, we present ORF Capture-Seq (OCS), a flexible and cost-effective method that addresses both challenges for targeted full-length isoform sequencing applications using collections of cloned ORFs as probes. As proof-of-concept, we show that an OCS pipeline focused on genes coding for transcription factors increases isoform detection by an order of magnitude, compared to unenriched sample. In short, OCS enables rapid discovery of isoforms from custom-selected genes and will allow mapping of the full set of human isoforms at reasonable cost.
The landscape of SNCA transcripts across synucleinopathies: New insights from long reads sequencing analysis
Dysregulation of alpha-synuclein expression has been implicated in the pathogenesis of synucleinopathies, in particular Parkinsontextquoterights Disease (PD) and Dementia with Lewy bodies (DLB). Previous studies have shown that the alternatively spliced isoforms of the SNCA gene are differentially expressed in different parts of the brain for PD and DLB patients. Similarly, SNCA isoforms with skipped exons can have a functional impact on the protein domains. The large intronic region of the SNCA gene was also shown to harbor structural variants that affect transcriptional levels. Here we apply the first study of using long read sequencing with targeted capture of both the gDNA and cDNA of the SNCA gene in brain tissues of PD, DLB, and control samples using the PacBio Sequel system. The targeted full-length cDNA (Iso-Seq) data confirmed complex usage of known alternative start sites and variable 3textquoteright UTR lengths, as well as novel 5textquoteright starts and 3textquoteright ends not previously described. The targeted gDNA data allowed phasing of up to 81% of the ~114kb SNCA region, with the longest phased block excedding 54 kb. We demonstrate that long gDNA and cDNA reads have the potential to reveal long-range information not previously accessible using traditional sequencing methods. This approach has a potential impact in studying disease risk genes such as SNCA, providing new insights into the genetic etiologies, including perturbations to the landscape the gene transcripts, of human complex diseases such as synucleinopathies.
Genome-wide association studies (GWAS) have identified many genomic loci associated with risk for schizophrenia, but unambiguous identification of the relationship between disease-associated variants and specific genes, and in particular their effect on risk conferring transcripts, has proven difficult. To better understand the specific molecular mechanism(s) at the schizophrenia locus in 11q25, we undertook cis expression quantitative trait loci (cis-eQTL) mapping for this 2 megabase genomic region using postmortem human brain samples. To comprehensively assess the effects of genetic risk upon local expression, we evaluated multiple transcript features: genes, exons, and exon-exon junctions in multiple brain regions-dorsolateral prefrontal cortex (DLPFC), hippocampus, and caudate. Genetic risk variants strongly associated with expression of SNX19 transcript features that tag multiple rare classes of SNX19 transcripts, whereas they only weakly affected expression of an exon-exon junction that tags the majority of abundant transcripts. The most prominent class of SNX19 risk-associated transcripts is predicted to be overexpressed, defined by an exon-exon splice junction between exons 8 and 10 (junc8.10) and that is predicted to encode proteins that lack the characteristic nexin C terminal domain. Risk alleles were also associated with either increased or decreased expression of multiple additional classes of transcripts. With RACE, molecular cloning, and long read sequencing, we found a number of novel SNX19 transcripts that further define the set of potential etiological transcripts. We explored epigenetic regulation of SNX19 expression and found that DNA methylation at CpG sites near the primary transcription start site and within exon 2 partially mediate the effects of risk variants on risk-associated expression. ATAC sequencing revealed that some of the most strongly risk-associated SNPs are located within a region of open chromatin, suggesting a nearby regulatory element is involved. These findings indicate a potentially complex molecular etiology, in which risk alleles for schizophrenia generate epigenetic alterations and dysregulation of multiple classes of SNX19 transcripts.
TIN2 is an important regulator of telomere length, and mutations in TINF2, the gene encoding TIN2, cause short-telomere syndromes. While the genetics underscore the importance of TIN2, the mechanism through which TIN2 regulates telomere length remains unclear. Here, we tested the effects of human TIN2 on telomerase activity. We identified a new isoform in human cells, TIN2M, that is expressed at levels similar to those of previously studied TIN2 isoforms. All three TIN2 isoforms localized to and maintained telomere integrity in vivo, and localization was not disrupted by telomere syndrome mutations. Using direct telomerase activity assays, we discovered that TIN2 stimulated telomerase processivity in vitro All of the TIN2 isoforms stimulated telomerase to similar extents. Mutations in the TPP1 TEL patch abrogated this stimulation, suggesting that TIN2 functions with TPP1/POT1 to stimulate telomerase processivity. We conclude from our data and previously published work that TIN2/TPP1/POT1 is a functional shelterin subcomplex. Copyright © 2019 Pike et al.
Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes
As they migrated out of Africa and into Europe and Asia, anatomically modern humans interbred with archaic hominins, such as Neanderthals and Denisovans. The result of this genetic introgression on the recipient populations has been of considerable interest, especially in cases of selection for specific archaic genetic variants. Hsieh et al. characterized adaptive structural variants and copy number variants that are likely targets of positive selection in Melanesians. Focusing on population-specific regions of the genome that carry duplicated genes and show an excess of amino acid replacements provides evidence for one of the mechanisms by which genetic novelty can arise and result in differentiation between human genomes.Science, this issue p. eaax2083INTRODUCTIONCharacterizing genetic variants underlying local adaptations in human populations is one of the central goals of evolutionary research. Most studies have focused on adaptive single-nucleotide variants that either arose as new beneficial mutations or were introduced after interbreeding with our now-extinct relatives, including Neanderthals and Denisovans. The adaptive role of copy number variants (CNVs), another well-known form of genomic variation generated through deletions or duplications that affect more base pairs in the genome, is less well understood, despite evidence that such mutations are subject to stronger selective pressures.RATIONALEThis study focuses on the discovery of introgressed and adaptive CNVs that have become enriched in specific human populations. We combine whole-genome CNV calling and population genetic inference methods to discover CNVs and then assess signals of selection after controlling for demographic history. We examine 266 publicly available modern human genomes from the Simons Genome Diversity Project and genomes of three ancient homininstextemdasha Denisovan, a Neanderthal from the Altai Mountains in Siberia, and a Neanderthal from Croatia. We apply long-read sequencing methods to sequence-resolve complex CNVs of interest specifically in the Melanesianstextemdashan Oceanian population distributed from Papua New Guinea to as far east as the islands of Fiji and known to harbor some of the greatest amounts of Neanderthal and Denisovan ancestry.RESULTSConsistent with the hypothesis of archaic introgression outside Africa, we find a significant excess of CNV sharing between modern non-African populations and archaic hominins (P = 0.039). Among Melanesians, we observe an enrichment of CNVs with potential signals of positive selection (n = 37 CNVs), of which 19 CNVs likely introgressed from archaic hominins. We show that Melanesian-stratified CNVs are significantly associated with signals of positive selection (P = 0.0323). Many map near or within genes associated with metabolism (e.g., ACOT1 and ACOT2), development and cell cycle or signaling (e.g., TNFRSF10D and CDK11A and CDK11B), or immune response (e.g., IFNLR1). We characterize two of the largest and most complex CNVs on chromosomes 16p11.2 and 8p21.3 that introgressed from Denisovans and Neanderthals, respectively, and are absent from most other human populations. At chromosome 16p11.2, we sequence-resolve a large duplication of >383 thousand base pairs (kbp) that originated from Denisovans and introgressed into the ancestral Melanesian population 60,000 to 170,000 years ago. This large duplication occurs at high frequency (>79%) in diverse Melanesian groups, shows signatures of positive selection, and maps adjacent to Homo sapienstextendashspecific duplications that predispose to rearrangements associated with autism. On chromosome 8p21.3, we identify a Melanesian haplotype that carries two CNVs, a ~6-kbp deletion, and a ~38-kbp duplication, with a Neanderthal origin and that introgressed into non-Africans 40,000 to 120,000 years ago. This CNV haplotype occurs at high frequency (44%) and shows signals consistent with a partial selective sweep in Melanesians. Using long-read sequencing genomic and transcriptomic data, we reconstruct the structure and complex evolutionary history for these two CNVs and discover previously undescribed duplicated genes (TNFRSF10D1, TNFRSF10D2, and NPIPB16) that show an excess of amino acid replacements consistent with the action of positive selection.CONCLUSIONOur results suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation that is absent from current reference genomes.Large adaptive-introgressed CNVs at chromosomes 8p21.3 and 16p11.2 in Melanesians.The magnifying glasses highlight structural differences between the archaic (top) and reference (bottom) genomes. Neanderthal (red) and Denisovan (blue) haplotypes encompassing large CNVs occur at high frequencies in Melanesians (44 and 79%, respectively) but are absent (black) in all non-Melanesians. These CNVs create positively selected genes (TNFRSF10D1, TNFRSF10D2, and NPIPB16) that are absent from the reference genome.Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.
Targeted Long-Read RNA Sequencing Demonstrates Transcriptional Diversity Driven by Splice-Site Variation in MYBPC3.
To date, clinical sequencing has focused on genomic DNA using targeted panels and exome sequencing. Sequencing of a large hypertrophic cardiomyopathy (HCM) cohort revealed that positive identification of a disease-associated variant was returned in only 32% of patients, with an additional 15% receiving inconclusive results. When genome sequencing fails to reveal causative variants, the transcriptome may provide additional diagnostic clarity. A recent study examining patients with genetically undiagnosed muscle disorders found that RNA sequencing, when used as a complement to exome and whole genome sequencing, had an overall diagnosis rate of 35%.