Background: The sequencing and haplotype phasing of entire gene sequences improves the understanding of the genetic basis of disease and drug response. One example is cystic fibrosis (CF). Cystic fibrosis transmembrane conductance regulator (CFTR) modulator therapies have revolutionized CF treatment, but only in a minority of CF subjects. Observed heterogeneity in CFTR modulator efficacy is related to the range of CFTR mutations; revertant mutations can modify the response to CFTR modulators, and other intronic variations in the ~200 kb CFTR gene have been linked to disease severity. Heterogeneity in the CFTR gene may also be linked to differential responses to CFTR modulators. The Targeted Locus Amplification (TLA) technology from Cergentis can be used to selectively amplify, sequence and phase the entire CFTR gene. With PacBio long-read SMRT Sequencing, TLA amplicons are sequenced intact and long-range phasing information of all fragments in entire amplicons is retrieved. Experimental Design and Methods: The TLA process produces amplicons consisting of 5-10 proximity ligated DNA fragments. TLA was performed on cell line and genomic DNA from Coriell GM12878, which has few heterozygous SNVs in CFTR, and the IB3 cell line, with known haplotypes but heterozygous for the delta508 mutation. All sample types were prepared with high and low density TLA primer sets, targeting coverage of >100 kb of the CFTR gene. Conclusion: We have demonstrated the power and utility of TLA with long-read SMRT Sequencing as a valuable research tool in sequencing and phasing across very long regions of the human genome. This process can be done in an efficient manner, multiplexing multiple genes and samples per SMRT Cell in a process amenable to high-throughput sequencing.
Genome-wide association studies (GWAS) have identified many genomic loci associated with risk for schizophrenia, but unambiguous identification of the relationship between disease-associated variants and specific genes, and in particular their effect on risk conferring transcripts, has proven difficult. To better understand the specific molecular mechanism(s) at the schizophrenia locus in 11q25, we undertook cis expression quantitative trait loci (cis-eQTL) mapping for this 2 megabase genomic region using postmortem human brain samples. To comprehensively assess the effects of genetic risk upon local expression, we evaluated multiple transcript features: genes, exons, and exon-exon junctions in multiple brain regions-dorsolateral prefrontal cortex (DLPFC), hippocampus, and caudate. Genetic risk variants strongly associated with expression of SNX19 transcript features that tag multiple rare classes of SNX19 transcripts, whereas they only weakly affected expression of an exon-exon junction that tags the majority of abundant transcripts. The most prominent class of SNX19 risk-associated transcripts is predicted to be overexpressed, defined by an exon-exon splice junction between exons 8 and 10 (junc8.10) and that is predicted to encode proteins that lack the characteristic nexin C terminal domain. Risk alleles were also associated with either increased or decreased expression of multiple additional classes of transcripts. With RACE, molecular cloning, and long read sequencing, we found a number of novel SNX19 transcripts that further define the set of potential etiological transcripts. We explored epigenetic regulation of SNX19 expression and found that DNA methylation at CpG sites near the primary transcription start site and within exon 2 partially mediate the effects of risk variants on risk-associated expression. ATAC sequencing revealed that some of the most strongly risk-associated SNPs are located within a region of open chromatin, suggesting a nearby regulatory element is involved. These findings indicate a potentially complex molecular etiology, in which risk alleles for schizophrenia generate epigenetic alterations and dysregulation of multiple classes of SNX19 transcripts.
Enrichment of fetal and maternal long cell-free DNA fragments from maternal plasma following DNA repair.
Cell-free DNA (cfDNA) fragments in maternal plasma contain DNA damage and may negatively impact the sensitivity of noninvasive prenatal testing (NIPT). However, some of these DNA damages are potentially reparable. We aimed to recover these damaged cfDNA molecules using PreCR DNA repair mix.cfDNA was extracted from 20 maternal plasma samples and was repaired and sequenced by the Illumina platform. Size profiles and fetal DNA fraction changes of repaired samples were characterized. Targeted sequencing of chromosome Y sequences was used to enrich fetal cfDNA molecules following repair. Single-molecule real-time (SMRT) sequencing platform was employed to characterize long (>250 bp) cfDNA molecules. NIPT of five trisomy 21 samples was performed.Size profiles of repaired libraries were altered, with significantly increased long (>250 bp) cfDNA molecules. Single nucleotide polymorphism (SNP)-based analyses showed that both fetal- and maternal-derived cfDNA molecules were enriched by the repair. Fetal DNA fractions in maternal plasma showed a small but consistent (4.8%) increase, which were contributed by a higher increment of long fetal cfDNA molecules. z-score values were improved in NIPT of all trisomy 21 samples.Plasma DNA repair recovers and enriches long cfDNA molecules of both fetal and maternal origins in maternal plasma. © 2018 John Wiley & Sons, Ltd.
Circulating DNA in plasma consists of short DNA fragments. The biological processes generating such fragments are not well understood. DNASE1L3 is a secreted DNASE1-like nuclease capable of digesting DNA in chromatin, and its absence causes anti-DNA responses and autoimmunity in humans and mice. We found that the deletion of Dnase1l3 in mice resulted in aberrations in the fragmentation of plasma DNA. Such aberrations included an increase in short DNA molecules below 120 bp, which was positively correlated with anti-DNA antibody levels. We also observed an increase in long, multinucleosomal DNA molecules and decreased frequencies of the most common end motifs found in plasma DNA. These aberrations were independent of anti-DNA response, suggesting that they represented a primary effect of DNASE1L3 loss. Pregnant Dnase1l3-/- mice carrying Dnase1l3+/- fetuses showed a partial restoration of normal frequencies of plasma DNA end motifs, suggesting that DNASE1L3 from Dnase1l3-proficient fetuses could enter maternal systemic circulation and affect both fetal and maternal DNA fragmentation in a systemic as well as local manner. However, the observed shortening of circulating fetal DNA relative to maternal DNA was not affected by the deletion of Dnase1l3 Collectively, our findings demonstrate that DNASE1L3 plays a role in circulating plasma DNA homeostasis by enhancing fragmentation and influencing end-motif frequencies. These results support a distinct role of DNASE1L3 as a regulator of the physical form and availability of cell-free DNA and may have important implications for the mechanism whereby this enzyme prevents autoimmunity. Copyright © 2019 the Author(s). Published by PNAS.
A physical and genetic map of Cannabis sativa identifies extensive rearrangements at the THC/CBD acid synthase loci.
Cannabis sativa is widely cultivated for medicinal, food, industrial, and recreational use, but much remains unknown regarding its genetics, including the molecular determinants of cannabinoid content. Here, we describe a combined physical and genetic map derived from a cross between the drug-type strain Purple Kush and the hemp variety “Finola.” The map reveals that cannabinoid biosynthesis genes are generally unlinked but that aromatic prenyltransferase (AP), which produces the substrate for THCA and CBDA synthases (THCAS and CBDAS), is tightly linked to a known marker for total cannabinoid content. We further identify the gene encoding CBCA synthase (CBCAS) and characterize its catalytic activity, providing insight into how cannabinoid diversity arises in cannabis. THCAS and CBDAS (which determine the drug vs. hemp chemotype) are contained within large (>250 kb) retrotransposon-rich regions that are highly nonhomologous between drug- and hemp-type alleles and are furthermore embedded within ~40 Mb of minimally recombining repetitive DNA. The chromosome structures are similar to those in grains such as wheat, with recombination focused in gene-rich, repeat-depleted regions near chromosome ends. The physical and genetic map should facilitate further dissection of genetic and molecular mechanisms in this commercially and medically important plant. © 2019 Laverty et al.; Published by Cold Spring Harbor Laboratory Press.
Copy-number variants in clinical genome sequencing: deployment and interpretation for rare and undiagnosed disease.
Current diagnostic testing for genetic disorders involves serial use of specialized assays spanning multiple technologies. In principle, genome sequencing (GS) can detect all genomic pathogenic variant types on a single platform. Here we evaluate copy-number variant (CNV) calling as part of a clinically accredited GS test.We performed analytical validation of CNV calling on 17 reference samples, compared the sensitivity of GS-based variants with those from a clinical microarray, and set a bound on precision using orthogonal technologies. We developed a protocol for family-based analysis of GS-based CNV calls, and deployed this across a clinical cohort of 79 rare and undiagnosed cases.We found that CNV calls from GS are at least as sensitive as those from microarrays, while only creating a modest increase in the number of variants interpreted (~10 CNVs per case). We identified clinically significant CNVs in 15% of the first 79 cases analyzed, all of which were confirmed by an orthogonal approach. The pipeline also enabled discovery of a uniparental disomy (UPD) and a 50% mosaic trisomy 14. Directed analysis of select CNVs enabled breakpoint level resolution of genomic rearrangements and phasing of de novo CNVs.Robust identification of CNVs by GS is possible within a clinical testing environment.
The human microbiome includes trillions of bacteria, many of which play a vital role in host physiology. Numerous studies have now detected bacterial DNA in first-pass meconium and amniotic fluid samples, suggesting that the human microbiome may commence in utero. However, these data have remained contentious due to underlying contamination issues. Here, we have used a previously described method for reducing contamination in microbiome workflows to determine if there is a fetal bacterial microbiome beyond the level of background contamination. We recruited 50 women undergoing non-emergency cesarean section deliveries with no evidence of intra-uterine infection and collected first-pass meconium and amniotic fluid samples. Full-length 16S rRNA gene sequencing was performed using PacBio SMRT cell technology, to allow high resolution profiling of the fetal gut and amniotic fluid bacterial microbiomes. Levels of inflammatory cytokines were measured in amniotic fluid, and levels of immunomodulatory short chain fatty acids (SCFAs) were quantified in meconium. All meconium samples and most amniotic fluid samples (36/43) contained bacterial DNA. The meconium microbiome was dominated by reads that mapped to Pelomonas puraquae. Aside from this species, the meconium microbiome was remarkably heterogeneous between patients. The amniotic fluid microbiome was more diverse and contained mainly reads that mapped to typical skin commensals, including Propionibacterium acnes and Staphylococcus spp. All meconium samples contained acetate and propionate, at ratios similar to those previously reported in infants. P. puraquae reads were inversely correlated with meconium propionate levels. Amniotic fluid cytokine levels were associated with the amniotic fluid microbiome. Our results demonstrate that bacterial DNA and SCFAs are present in utero, and have the potential to influence the developing fetal immune system.