Menu
April 21, 2020  |  

Long-Read Sequencing Emerging in Medical Genetics

The wide implementation of next-generation sequencing (NGS) technologies has revolutionized the field of medical genetics. However, the short read lengths of currently used sequencing approaches pose a limitation for identification of structural variants, sequencing repetitive regions, phasing alleles and distinguishing highly homologous genomic regions. These limitations may significantly contribute to the diagnostic gap in patients with genetic disorders who have undergone standard NGS, like whole exome or even genome sequencing. Now, the emerging long-read sequencing (LRS) technologies may offer improvements in the characterization of genetic variation and regions that are difficult to assess with the currently prevailing NGS approaches. LRS has so far mainly been used to investigate genetic disorders with previously known or strongly suspected disease loci. While these targeted approaches already show the potential of LRS, it remains to be seen whether LRS technologies can soon enable true whole genome sequencing routinely. Ultimately, this could allow the de novo assembly of individual whole genomes used as a generic test for genetic disorders. In this article, we summarize the current LRS-based research on human genetic disorders and discuss the potential of these technologies to facilitate the next major advancements in medical genetics.


April 21, 2020  |  

Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight.

The human genome contains “dark” gene regions that cannot be adequately assembled or aligned using standard short-read sequencing technologies, preventing researchers from identifying mutations within these gene regions that may be relevant to human disease. Here, we identify regions with few mappable reads that we call dark by depth, and others that have ambiguous alignment, called camouflaged. We assess how well long-read or linked-read technologies resolve these regions.Based on standard whole-genome Illumina sequencing data, we identify 36,794 dark regions in 6054 gene bodies from pathways important to human health, development, and reproduction. Of these gene bodies, 8.7% are completely dark and 35.2% are =?5% dark. We identify dark regions that are present in protein-coding exons across 748 genes. Linked-read or long-read sequencing technologies from 10x Genomics, PacBio, and Oxford Nanopore Technologies reduce dark protein-coding regions to approximately 50.5%, 35.6%, and 9.6%, respectively. We present an algorithm to resolve most camouflaged regions and apply it to the Alzheimer’s Disease Sequencing Project. We rescue a rare ten-nucleotide frameshift deletion in CR1, a top Alzheimer’s disease gene, found in disease cases but not in controls.While we could not formally assess the association of the CR1 frameshift mutation with Alzheimer’s disease due to insufficient sample-size, we believe it merits investigating in a larger cohort. There remain thousands of potentially important genomic regions overlooked by short-read sequencing that are largely resolved by long-read technologies.


September 22, 2019  |  

Searching for convergent pathways in autism spectrum disorders: insights from human brain transcriptome studies.

Autism spectrum disorder (ASD) is one of the most heritable neuropsychiatric conditions. The complex genetic landscape of the disorder includes both common and rare variants at hundreds of genetic loci. This marked heterogeneity has thus far hampered efforts to develop genetic diagnostic panels and targeted pharmacological therapies. Here, we give an overview of the current literature on the genetic basis of ASD, and review recent human brain transcriptome studies and their role in identifying convergent pathways downstream of the heterogeneous genetic variants. We also discuss emerging evidence on the involvement of non-coding genomic regions and non-coding RNAs in ASD.


September 22, 2019  |  

The state of long non-coding RNA biology.

Transcriptomic studies have demonstrated that the vast majority of the genomes of mammals and other complex organisms is expressed in highly dynamic and cell-specific patterns to produce large numbers of intergenic, antisense and intronic long non-protein-coding RNAs (lncRNAs). Despite well characterized examples, their scaling with developmental complexity, and many demonstrations of their association with cellular processes, development and diseases, lncRNAs are still to be widely accepted as major players in gene regulation. This may reflect an underappreciation of the extent and precision of the epigenetic control of differentiation and development, where lncRNAs appear to have a central role, likely as organizational and guide molecules: most lncRNAs are nuclear-localized and chromatin-associated, with some involved in the formation of specialized subcellular domains. I suggest that a reassessment of the conceptual framework of genetic information and gene expression in the 4-dimensional ontogeny of spatially organized multicellular organisms is required. Together with this and further studies on their biology, the key challenges now are to determine the structure?function relationships of lncRNAs, which may be aided by emerging evidence of their modular structure, the role of RNA editing and modification in enabling epigenetic plasticity, and the role of RNA signaling in transgenerational inheritance of experience.


September 22, 2019  |  

Elevated expression of a minor isoform of ANK3 is a risk factor for bipolar disorder.

Ankyrin-3 (ANK3) is one of the few genes that have been consistently identified as associated with bipolar disorder by multiple genome-wide association studies. However, the exact molecular basis of the association remains unknown. A rare loss-of-function splice-site SNP (rs41283526*G) in a minor isoform of ANK3 (incorporating exon ENSE00001786716) was recently identified as protective of bipolar disorder and schizophrenia. This suggests that an elevated expression of this isoform may be involved in the etiology of the disorders. In this study, we used novel approaches and data sets to test this hypothesis. First, we strengthen the statistical evidence supporting the allelic association by replicating the protective effect of the minor allele of rs41283526 in three additional large independent samples (meta-analysis p-values: 6.8E-05 for bipolar disorder and 8.2E-04 for schizophrenia). Second, we confirm the hypothesis that both bipolar and schizophrenia patients have a significantly higher expression of this isoform than controls (p-values: 3.3E-05 for schizophrenia and 9.8E-04 for bipolar type I). Third, we determine the transcription start site for this minor isoform by Pacific Biosciences sequencing of full-length cDNA and show that it is primarily expressed in the corpus callosum. Finally, we combine genotype and expression data from a large Norwegian sample of psychiatric patients and controls, and show that the risk alleles in ANK3 identified by bipolar disorder GWAS are located near the transcription start site of this isoform and are significantly associated with its elevated expression. Together, these results point to the likely molecular mechanism underlying ANK3´s association with bipolar disorder.


September 22, 2019  |  

Laboratory colonization stabilizes the naturally dynamic microbiome composition of field collected Dermacentor andersoni ticks.

Nearly a quarter of emerging infectious diseases identified in the last century are arthropod-borne. Although ticks and insects can carry pathogenic microorganisms, non-pathogenic microbes make up the majority of their microbial communities. The majority of tick microbiome research has had a focus on discovery and description; very few studies have analyzed the ecological context and functional responses of the bacterial microbiome of ticks. The goal of this analysis was to characterize the stability of the bacterial microbiome of Dermacentor andersoni ticks between generations and two populations within a species.The bacterial microbiome of D. andersoni midguts and salivary glands was analyzed from populations collected at two different ecologically distinct sites by comparing field (F1) and lab-reared populations (F1-F3) over three generations. The microbiome composition of pooled and individual samples was analyzed by sequencing nearly full-length 16S rRNA gene amplicons using a Pacific Biosciences CCS platform that allows identification of bacteria to the species level.In this study, we found that the D. andersoni microbiome was distinct in different geographic populations and was tissue specific, differing between the midgut and the salivary gland, over multiple generations. Additionally, our study showed that the microbiomes of laboratory-reared populations were not necessarily representative of their respective field populations. Furthermore, we demonstrated that the microbiome of a few individual ticks does not represent the microbiome composition at the population level.We demonstrated that the bacterial microbiome of D. andersoni was complex over three generations and specific to tick tissue (midgut vs. salivary glands) as well as geographic location (Burns, Oregon vs. Lake Como, Montana vs. laboratory setting). These results provide evidence that habitat of the tick population is a vital component of the complexity of the bacterial microbiome of ticks, and that the microbiome of lab colonies may not allow for comparative analyses with field populations. A broader understanding of microbiome variation will be required if we are to employ manipulation of the microbiome as a method for interfering with acquisition and transmission of tick-borne pathogens.


September 22, 2019  |  

Transcriptional fates of human-specific segmental duplications in brain.

Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.© 2018 Dougherty et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019  |  

Quantitative isoform-profiling of highly diversified recognition molecules.

Complex biological systems rely on cell surface cues that govern cellular self-recognition and selective interactions with appropriate partners. Molecular diversification of cell surface recognition molecules through DNA recombination and complex alternative splicing has emerged as an important principle for encoding such interactions. However, the lack of tools to specifically detect and quantify receptor protein isoforms is a major impediment to functional studies. We here developed a workflow for targeted mass spectrometry by selected reaction monitoring (SRM) that permits quantitative assessment of highly diversified protein families. We apply this workflow to dissecting the molecular diversity of the neuronal neurexin receptors and uncover an alternative splicing-dependent recognition code for synaptic ligands.


September 22, 2019  |  

Human copy number variants are enriched in regions of low mappability.

Copy number variants (CNVs) are known to affect a large portion of the human genome and have been implicated in many diseases. Although whole-genome sequencing (WGS) can help identify CNVs, most analytical methods suffer from limited sensitivity and specificity, especially in regions of low mappability. To address this, we use PopSV, a CNV caller that relies on multiple samples to control for technical variation. We demonstrate that our calls are stable across different types of repeat-rich regions and validate the accuracy of our predictions using orthogonal approaches. Applying PopSV to 640 human genomes, we find that low-mappability regions are approximately 5 times more likely to harbor germline CNVs, in stark contrast to the nearly uniform distribution observed for somatic CNVs in 95 cancer genomes. In addition to known enrichments in segmental duplication and near centromeres and telomeres, we also report that CNVs are enriched in specific types of satellite and in some of the most recent families of transposable elements. Finally, using this comprehensive approach, we identify 3455 regions with recurrent CNVs that were missing from existing catalogs. In particular, we identify 347 genes with a novel exonic CNV in low-mappability regions, including 29 genes previously associated with disease.


September 22, 2019  |  

Methylation of the reelin gene promoter in peripheral blood and its relationship with the cognitive function of schizophrenia patients.

There is a decrease in the expression of the reelin gene (RELN) in the brain of schizophrenia patients, which can underlie observed cognitive abnormalities. It is suggested that this decrease is caused by the hypermethylation of the RELN promoter. The aim of the study was to investigate methylation of the RELN promoter in the peripheral blood of schizophrenia patients and its association with their cognitive deficits. A modified SMRT-BS (single-molecule real-time bisulfite sequencing) was used. We determined the methylation rate of 170 CpG sites within a 1465 bp DNA region containing the entire CpG island in the RELN promoter in 51 schizophrenia patients and 52 healthy controls. All subjects completed a battery of neuropsychological tests. There were no DNA methylation changes associated with schizophrenia. Most CpGs sites were unmethylated in both groups. At the same time, there was a variability in the methylation level of different regions within the promoter. The methylation level in the area from -258 to -151 bp relative to RELN transcription start site was a significant predictor of the index of patients’ cognitive functioning if sex, age, smoking, education, and polymorphism rsl858815 had been considered. The positive correlation between the methylation rate in this region and cognitive index suggests that the hypomethylation of the RELN promoter could contribute to the development of cognitive deficits in schizophrenia.


September 22, 2019  |  

Relationship between Alzheimer’s disease-associated SNPs within the CLU gene, local DNA methylation and episodic verbal memory in healthy and schizophrenia subjects.

Genetic variation may impact on local DNA methylation patterns. Therefore, information about allele-specific DNA methylation (ASM) within disease-related loci has been proposed to be useful for the interpretation of GWAS results. To explore mechanisms that may underlie associations between Alzheimer’s disease (AD) and schizophrenia risk CLU gene and verbal memory, one of the most affected cognitive domains in both conditions, we studied DNA methylation in a region between AD-associated SNPs rs9331888 and rs9331896 in 72 healthy individuals and 73 schizophrenia patients. Using single-molecule real-time bisulfite sequencing we assessed the haplotype-dependent ASM in this region. We then investigated whether its methylation could influence episodic verbal memory measured with the Rey Auditory Verbal Learning Test in these two cohorts. The region showed a complex methylation pattern, which was similar in healthy and schizophrenia individuals and unrelated to haplotypes. The pattern predicted memory scores in controls. The results suggest that epigenetic modifications within the CLU locus may play a role in memory variation, independent of ASM. Copyright © 2018 Elsevier B.V. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.