Many of our major crop species are polyploids, containing more than one genome or set of chromosomes. Polyploid crops present unique challenges, including difficulties in genome assembly, in discriminating between multiple gene and sequence copies, and in genetic mapping, hindering use of genomic data for genetics and breeding. Polyploid genomes may also be more prone to containing structural variation, such as loss of gene copies or sequences (presence–absence variation) and the presence of genes or sequences in multiple copies (copy-number variation). Although the two main types of genomic structural variation commonly identified are presence–absence variation and copy-number variation, we propose…
We present a high-quality de novo genome assembly (rheMacS) of the Chinese rhesus macaque (Macaca mulatta) using long-read sequencing and multiplatform scaffolding approaches. Compared to the current Indian rhesus macaque reference genome (rheMac8), rheMacS increases sequence contiguity 75-fold, closing 21,940 of the remaining assembly gaps (60.8 Mbp). We improve gene annotation by generating more than two million full-length transcripts from ten different tissues by long-read RNA sequencing. We sequence resolve 53,916 structural variants (96% novel) and identify 17,000 ape-specific structural variants (ASSVs) based on comparison to ape genomes. Many ASSVs map within ChIP-seq predicted enhancer regions where apes and macaque…
The wide implementation of next-generation sequencing (NGS) technologies has revolutionized the field of medical genetics. However, the short read lengths of currently used sequencing approaches pose a limitation for identification of structural variants, sequencing repetitive regions, phasing alleles and distinguishing highly homologous genomic regions. These limitations may significantly contribute to the diagnostic gap in patients with genetic disorders who have undergone standard NGS, like whole exome or even genome sequencing. Now, the emerging long-read sequencing (LRS) technologies may offer improvements in the characterization of genetic variation and regions that are difficult to assess with the currently prevailing NGS approaches. LRS…
The human genome contains “dark” gene regions that cannot be adequately assembled or aligned using standard short-read sequencing technologies, preventing researchers from identifying mutations within these gene regions that may be relevant to human disease. Here, we identify regions with few mappable reads that we call dark by depth, and others that have ambiguous alignment, called camouflaged. We assess how well long-read or linked-read technologies resolve these regions.Based on standard whole-genome Illumina sequencing data, we identify 36,794 dark regions in 6054 gene bodies from pathways important to human health, development, and reproduction. Of these gene bodies, 8.7% are completely dark…
Autism spectrum disorder (ASD) is one of the most heritable neuropsychiatric conditions. The complex genetic landscape of the disorder includes both common and rare variants at hundreds of genetic loci. This marked heterogeneity has thus far hampered efforts to develop genetic diagnostic panels and targeted pharmacological therapies. Here, we give an overview of the current literature on the genetic basis of ASD, and review recent human brain transcriptome studies and their role in identifying convergent pathways downstream of the heterogeneous genetic variants. We also discuss emerging evidence on the involvement of non-coding genomic regions and non-coding RNAs in ASD.
Transcriptomic studies have demonstrated that the vast majority of the genomes of mammals and other complex organisms is expressed in highly dynamic and cell-specific patterns to produce large numbers of intergenic, antisense and intronic long non-protein-coding RNAs (lncRNAs). Despite well characterized examples, their scaling with developmental complexity, and many demonstrations of their association with cellular processes, development and diseases, lncRNAs are still to be widely accepted as major players in gene regulation. This may reflect an underappreciation of the extent and precision of the epigenetic control of differentiation and development, where lncRNAs appear to have a central role, likely as…
Ankyrin-3 (ANK3) is one of the few genes that have been consistently identified as associated with bipolar disorder by multiple genome-wide association studies. However, the exact molecular basis of the association remains unknown. A rare loss-of-function splice-site SNP (rs41283526*G) in a minor isoform of ANK3 (incorporating exon ENSE00001786716) was recently identified as protective of bipolar disorder and schizophrenia. This suggests that an elevated expression of this isoform may be involved in the etiology of the disorders. In this study, we used novel approaches and data sets to test this hypothesis. First, we strengthen the statistical evidence supporting the allelic association…
Nearly a quarter of emerging infectious diseases identified in the last century are arthropod-borne. Although ticks and insects can carry pathogenic microorganisms, non-pathogenic microbes make up the majority of their microbial communities. The majority of tick microbiome research has had a focus on discovery and description; very few studies have analyzed the ecological context and functional responses of the bacterial microbiome of ticks. The goal of this analysis was to characterize the stability of the bacterial microbiome of Dermacentor andersoni ticks between generations and two populations within a species.The bacterial microbiome of D. andersoni midguts and salivary glands was analyzed…
Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their…
Complex biological systems rely on cell surface cues that govern cellular self-recognition and selective interactions with appropriate partners. Molecular diversification of cell surface recognition molecules through DNA recombination and complex alternative splicing has emerged as an important principle for encoding such interactions. However, the lack of tools to specifically detect and quantify receptor protein isoforms is a major impediment to functional studies. We here developed a workflow for targeted mass spectrometry by selected reaction monitoring (SRM) that permits quantitative assessment of highly diversified protein families. We apply this workflow to dissecting the molecular diversity of the neuronal neurexin receptors and…
Copy number variants (CNVs) are known to affect a large portion of the human genome and have been implicated in many diseases. Although whole-genome sequencing (WGS) can help identify CNVs, most analytical methods suffer from limited sensitivity and specificity, especially in regions of low mappability. To address this, we use PopSV, a CNV caller that relies on multiple samples to control for technical variation. We demonstrate that our calls are stable across different types of repeat-rich regions and validate the accuracy of our predictions using orthogonal approaches. Applying PopSV to 640 human genomes, we find that low-mappability regions are approximately…
There is a decrease in the expression of the reelin gene (RELN) in the brain of schizophrenia patients, which can underlie observed cognitive abnormalities. It is suggested that this decrease is caused by the hypermethylation of the RELN promoter. The aim of the study was to investigate methylation of the RELN promoter in the peripheral blood of schizophrenia patients and its association with their cognitive deficits. A modified SMRT-BS (single-molecule real-time bisulfite sequencing) was used. We determined the methylation rate of 170 CpG sites within a 1465 bp DNA region containing the entire CpG island in the RELN promoter in…
Genetic variation may impact on local DNA methylation patterns. Therefore, information about allele-specific DNA methylation (ASM) within disease-related loci has been proposed to be useful for the interpretation of GWAS results. To explore mechanisms that may underlie associations between Alzheimer’s disease (AD) and schizophrenia risk CLU gene and verbal memory, one of the most affected cognitive domains in both conditions, we studied DNA methylation in a region between AD-associated SNPs rs9331888 and rs9331896 in 72 healthy individuals and 73 schizophrenia patients. Using single-molecule real-time bisulfite sequencing we assessed the haplotype-dependent ASM in this region. We then investigated whether its methylation…