Menu
April 21, 2020

DNA Methylation at the Schizophrenia and Intelligence GWAS-Implicated MIR137HG Locus May Be Associated with Disease and Cognitive Functions

The largest genome-wide association studies have identified schizophrenia and intelligence associated variants in the MIR137HG locus containing genes encoding microRNA-137 and microRNA-2682. In the present study, we investigated DNA methylation in the MIR137HG intragenic CpG island (CGI) in the peripheral blood of 44 patients with schizophrenia and 50 healthy controls. The CGI included the entire MIR137 gene and the region adjacent to the 5′-end of MIR2682. The aim of the study was to examine the relationship of the CGI methylation with schizophrenia and cognitive functioning. The methylation level of 91 CpG located in the selected region was established for each participant by means of single-molecule real-time bisulfite sequencing. All subjects completed the battery of neuropsychological tests. We found that the CGI was hypomethylated in both groups, except for one site—CpG (chr1: 98?511?049), with significant interindividual variability in methylation. A higher level of methylation of this CpG was seen in male patients and was associated with a decrease in the cognitive index in the combined sample of patients and controls. Our data suggest that further investigation of mechanisms that regulate the MIR137 and MIR2682 genes expression might help to understand the molecular basis of cognitive deficits in schizophrenia.


April 21, 2020

Quality control project of NGS HLA genotyping for the 17th International HLA and Immunogenetics Workshop.

The 17th International HLA and Immunogenetics Workshop (IHIW) organizers conducted a Pilot Study (PS) in which 13 laboratories (15 groups) participated to assess the performance of the various sequencing library preparation protocols, NGS platforms and software in use prior to the workshop. The organizers sent 50 cell lines to each of the 15 groups, scored the 15 independently generated sets of NGS HLA genotyping data, and generated “consensus” HLA genotypes for each of the 50 cell lines. Proficiency Testing (PT) was subsequently organized using four sets of 24 cell lines, selected from 48 of 50 PS cell lines, to validate the quality of NGS HLA typing data from the 34 participating IHIW laboratories. Completion of the PT program with a minimum score of 95% concordance at the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 loci satisfied the requirements to submit NGS HLA typing data for the 17th IHIW projects. Together, these PS and PT efforts constituted the 17th IHIW Quality Control project. Overall PT concordance rates for HLA-A, HLA-B, HLA-C, HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1, HLA-DRB1, HLA-DRB3, HLA-DRB4 and HLA-DRB5 were 98.1%, 97.0% and 98.1%, 99.0%, 98.6%, 98.8%, 97.6%, 96.0%, 99.1%, 90.0% and 91.7%, respectively. Across all loci, the majority of the discordance was due to allele dropout. The high cost of NGS HLA genotyping per experiment likely prevented the retyping of initially failed HLA loci. Despite the high HLA genotype concordance rates of the software, there remains room for improvement in the assembly of more accurate consensus DNA sequences by NGS HLA genotyping software. Copyright © 2019 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.


April 21, 2020

Detecting a long insertion variant in SAMD12 by SMRT sequencing: implications of long-read whole-genome sequencing for repeat expansion diseases.

Long-read sequencing technology is now capable of reading single-molecule DNA with an average read length of more than 10?kb, fully enabling the coverage of large structural variations (SVs). This advantage may pave the way for the detection of unprecedented SVs as well as repeat expansions. Pathogenic SVs of only known genes used to be selectively analyzed based on prior knowledge of target DNA sequence. The unbiased application of long-read whole-genome sequencing (WGS) for the detection of pathogenic SVs has just begun. Here, we apply PacBio SMRT sequencing in a Japanese family with benign adult familial myoclonus epilepsy (BAFME). Our SV selection of low-coverage WGS data (7×) narrowed down the candidates to only six SVs in a 7.16-Mb region of the BAFME1 locus and correctly determined an approximately 4.6-kb SAMD12 intronic repeat insertion, which is causal of BAFME1. These results indicate that long-read WGS is potentially useful for evaluating all of the known SVs in a genome and identifying new disease-causing SVs in combination with other genetic methods to resolve the genetic causes of currently unexplained diseases.


April 21, 2020

Using Cre-recombinase-driven Polylox barcoding for in vivo fate mapping in mice.

Fate mapping is a powerful genetic tool for linking stem or progenitor cells with their progeny, and hence for defining cell lineages in vivo. The resolution of fate mapping depends on the numbers of distinct markers that are introduced in the beginning into stem or progenitor cells; ideally, numbers should be sufficiently large to allow the tracing of output from individual cells. Highly diverse genetic barcodes can serve this purpose. We recently developed an endogenous genetic barcoding system, termed Polylox. In Polylox, random DNA recombination can be induced by transient activity of Cre recombinase in a 2.1-kb-long artificial recombination substrate that has been introduced into a defined locus in mice (Rosa26Polylox reporter mice). Here, we provide a step-by-step protocol for the use of Polylox, including barcode induction and estimation of induction efficiency, barcode retrieval with single-molecule real-time (SMRT) DNA sequencing followed by computational barcode identification, and the calculation of barcode-generation probabilities, which is key for estimations of single-cell labeling for a given number of stem cells. Thus, Polylox barcoding enables high-resolution fate mapping in essentially all tissues in mice for which inducible Cre driver lines are available. Alternative methods include ex vivo cell barcoding, inducible transposon insertion and CRISPR-Cas9-based barcoding; Polylox currently allows combining non-invasive and cell-type-specific labeling with high label diversity. The execution time of this protocol is ~2-3 weeks for experimental data generation and typically <2 d for computational Polylox decoding and downstream analysis.


April 21, 2020

The Genome of C57BL/6J “Eve”, the Mother of the Laboratory Mouse Genome Reference Strain.

Isogenic laboratory mouse strains enhance reproducibility because individual animals are genetically identical. For the most widely used isogenic strain, C57BL/6, there exists a wealth of genetic, phenotypic, and genomic data, including a high-quality reference genome (GRCm38.p6). Now 20 years after the first release of the mouse reference genome, C57BL/6J mice are at least 26 inbreeding generations removed from GRCm38 and the strain is now maintained with periodic reintroduction of cryorecovered mice derived from a single breeder pair, aptly named Adam and Eve. To provide an update to the mouse reference genome that more accurately represents the genome of today’s C57BL/6J mice, we took advantage of long read, short read, and optical mapping technologies to generate a de novo assembly of the C57BL/6J Eve genome (B6Eve). Using these data, we have addressed recurring variants observed in previous mouse genomic studies. We have also identified structural variations, closed gaps in the mouse reference assembly, and revealed previously unannotated coding sequences. This B6Eve assembly explains discrepant observations that have been associated with GRCm38-based analyses, and will inform a reference genome that is more representative of the C57BL/6J mice that are in use today.Copyright © 2019 Sarsani et al.


April 21, 2020

Characterization of Mauritian cynomolgus macaque Fc?R alleles using long-read sequencing.

The Fc?Rs are immune cell surface proteins that bind IgG and facilitate cytokine production, phagocytosis, and Ab-dependent, cell-mediated cytotoxicity. Fc?Rs play a critical role in immunity; variation in these genes is implicated in autoimmunity and other diseases. Cynomolgus macaques are an excellent animal model for many human diseases, and Mauritian cynomolgus macaques (MCMs) are particularly useful because of their restricted genetic diversity. Previous studies of MCM immune gene diversity have focused on the MHC and killer cell Ig-like receptor. In this study, we characterize Fc?R diversity in 48 MCMs using PacBio long-read sequencing to identify novel alleles of each of the four expressed MCM Fc?R genes. We also developed a high-throughput Fc?R genotyping assay, which we used to determine allele frequencies and identify Fc?R haplotypes in more than 500 additional MCMs. We found three alleles for Fc?R1A, seven each for Fc?R2A and Fc?R2B, and four for Fc?R3A; these segregate into eight haplotypes. We also assessed whether different Fc?R alleles confer different Ab-binding affinities by surface plasmon resonance and found minimal difference in binding affinities across alleles for a panel of wild type and Fc-engineered human IgG. This work suggests that although MCMs may not fully represent the diversity of Fc?R responses in humans, they may offer highly reproducible results for mAb therapy and toxicity studies. Copyright © 2018 by The American Association of Immunologists, Inc.


April 21, 2020

Genome analyses for the Tohoku Medical Megabank Project towards establishment of personalized healthcare.

Personalized healthcare (PHC) based on an individual’s genetic make-up is one of the most advanced, yet feasible, forms of medical care. The Tohoku Medical Megabank (TMM) Project aims to combine population genomics, medical genetics and prospective cohort studies to develop a critical infrastructure for the establishment of PHC. To date, a TMM CommCohort (adult general population) and a TMM BirThree Cohort (birth+three-generation families) have conducted recruitments and baseline surveys. Genome analyses as part of the TMM Project will aid in the development of a high-fidelity whole-genome Japanese reference panel, in designing custom single-nucleotide polymorphism (SNP) arrays specific to Japanese, and in estimation of the biological significance of genetic variations through linked investigations of the cohorts. Whole-genome sequencing from >3,500 unrelated Japanese and establishment of a Japanese reference genome sequence from long-read data have been done. We next aim to obtain genotype data for all TMM cohort participants (>150,000) using our custom SNP arrays. These data will help identify disease-associated genomic signatures in the Japanese population, while genomic data from TMM BirThree Cohort participants will be used to improve the reference genome panel. Follow-up of the cohort participants will allow us to test the genetic markers and, consequently, contribute to the realization of PHC.


April 21, 2020

TSD: A Computational Tool To Study the Complex Structural Variants Using PacBio Targeted Sequencing Data.

PacBio sequencing is a powerful approach to study DNA or RNA sequences in a longer scope. It is especially useful in exploring the complex structural variants generated by random integration or multiple rearrangement of endogenous or exogenous sequences. Here, we present a tool, TSD, for complex structural variant discovery using PacBio targeted sequencing data. It allows researchers to identify and visualize the genomic structures of targeted sequences by unlimited splitting, alignment and assembly of long PacBio reads. Application to the sequencing data derived from an HBV integrated human cell line(PLC/PRF/5) indicated that TSD could recover the full profile of HBV integration events, especially for the regions with the complex human-HBV genome integrations and multiple HBV rearrangements. Compared to other long read analysis tools, TSD showed a better performance for detecting complex genomic structural variants. TSD is publicly available at: https://github.com/menggf/tsd. Copyright © 2019 Meng et al.


April 21, 2020

NCF1 (p47phox)-deficient chronic granulomatous disease: comprehensive genetic and flow cytometric analysis.

Mutations in NCF1 (p47phox) cause autosomal recessive chronic granulomatous disease (CGD) with abnormal dihydrorhodamine (DHR) assay and absent p47phox protein. Genetic identification of NCF1 mutations is complicated by adjacent highly conserved (>98%) pseudogenes (NCF1B and NCF1C). NCF1 has GTGT at the start of exon 2, whereas the pseudogenes each delete 1 GT (?GT). In p47phox CGD, the most common mutation is ?GT in NCF1 (c.75_76delGT; p.Tyr26fsX26). Sequence homology between NCF1 and its pseudogenes precludes reliable use of standard Sanger sequencing for NCF1 mutations and for confirming carrier status. We first established by flow cytometry that neutrophils from p47phox CGD patients had negligible p47phox expression, whereas those from p47phox CGD carriers had ~60% of normal p47phox expression, independent of the specific mutation in NCF1 We developed a droplet digital polymerase chain reaction (ddPCR) with 2 distinct probes, recognizing either the wild-type GTGT sequence or the ?GT sequence. A second ddPCR established copy number by comparison with the single-copy telomerase reverse transcriptase gene, TERT We showed that 84% of p47phox CGD patients were homozygous for ?GT NCF1 The ddPCR assay also enabled determination of carrier status of relatives. Furthermore, only 79.2% of normal volunteers had 2 copies of GTGT per 6 total (NCF1/NCF1B/NCF1C) copies, designated 2/6; 14.7% had 3/6, and 1.6% had 4/6 GTGT copies. In summary, flow cytometry for p47phox expression quickly identifies patients and carriers of p47phox CGD, and genomic ddPCR identifies patients and carriers of ?GT NCF1, the most common mutation in p47phox CGD.


April 21, 2020

Construction of full-length Japanese reference panel of class I HLA genes with single-molecule, real-time sequencing.

Human leukocyte antigen (HLA) is a gene complex known for its exceptional diversity across populations, importance in organ and blood stem cell transplantation, and associations of specific alleles with various diseases. We constructed a Japanese reference panel of class I HLA genes (ToMMo HLA panel), comprising a distinct set of HLA-A, HLA-B, HLA-C, and HLA-H alleles, by single-molecule, real-time (SMRT) sequencing of 208 individuals included in the 1070 whole-genome Japanese reference panel (1KJPN). For high-quality allele reconstruction, we developed a novel pipeline, Primer-Separation Assembly and Refinement Pipeline (PSARP), in which the SMRT sequencing and additional short-read data were used. The panel consisted of 139 alleles, which were all extended from known IPD-IMGT/HLA sequences, contained 40 with novel variants, and captured more than 96.5% of allelic diversity in 1KJPN. These newly available sequences would be important resources for research and clinical applications including high-resolution HLA typing, genetic association studies, and analyzes of cis-regulatory elements.


April 21, 2020

Megabase Length Hypermutation Accompanies Human Structural Variation at 17p11.2.

DNA rearrangements resulting in human genome structural variants (SVs) are caused by diverse mutational mechanisms. We used long- and short-read sequencing technologies to investigate end products of de novo chromosome 17p11.2 rearrangements and query the molecular mechanisms underlying both recurrent and non-recurrent events. Evidence for an increased rate of clustered single-nucleotide variant (SNV) mutation in cis with non-recurrent rearrangements was found. Indel and SNV formation are associated with both copy-number gains and losses of 17p11.2, occur up to ~1 Mb away from the breakpoint junctions, and favor C > G transversion substitutions; results suggest that single-stranded DNA is formed during the genesis of the SV and provide compelling support for a microhomology-mediated break-induced replication (MMBIR) mechanism for SV formation. Our data show an additional mutational burden of MMBIR consisting of hypermutation confined to the locus and manifesting as SNVs and indels predominantly within genes. Copyright © 2019 Elsevier Inc. All rights reserved.


April 21, 2020

A survey and evaluations of histogram-based statistics in alignment-free sequence comparison.

Since the dawn of the bioinformatics field, sequence alignment scores have been the main method for comparing sequences. However, alignment algorithms are quadratic, requiring long execution time. As alternatives, scientists have developed tens of alignment-free statistics for measuring the similarity between two sequences.We surveyed tens of alignment-free k-mer statistics. Additionally, we evaluated 33 statistics and multiplicative combinations between the statistics and/or their squares. These statistics are calculated on two k-mer histograms representing two sequences. Our evaluations using global alignment scores revealed that the majority of the statistics are sensitive and capable of finding similar sequences to a query sequence. Therefore, any of these statistics can filter out dissimilar sequences quickly. Further, we observed that multiplicative combinations of the statistics are highly correlated with the identity score. Furthermore, combinations involving sequence length difference or Earth Mover’s distance, which takes the length difference into account, are always among the highest correlated paired statistics with identity scores. Similarly, paired statistics including length difference or Earth Mover’s distance are among the best performers in finding the K-closest sequences. Interestingly, similar performance can be obtained using histograms of shorter words, resulting in reducing the memory requirement and increasing the speed remarkably. Moreover, we found that simple single statistics are sufficient for processing next-generation sequencing reads and for applications relying on local alignment. Finally, we measured the time requirement of each statistic. The survey and the evaluations will help scientists with identifying efficient alternatives to the costly alignment algorithm, saving thousands of computational hours.The source code of the benchmarking tool is available as Supplementary Materials. © The Author 2017. Published by Oxford University Press.


April 21, 2020

Noncoding CGG repeat expansions in neuronal intranuclear inclusion disease, oculopharyngodistal myopathy and an overlapping disease.

Noncoding repeat expansions cause various neuromuscular diseases, including myotonic dystrophies, fragile X tremor/ataxia syndrome, some spinocerebellar ataxias, amyotrophic lateral sclerosis and benign adult familial myoclonic epilepsies. Inspired by the striking similarities in the clinical and neuroimaging findings between neuronal intranuclear inclusion disease (NIID) and fragile X tremor/ataxia syndrome caused by noncoding CGG repeat expansions in FMR1, we directly searched for repeat expansion mutations and identified noncoding CGG repeat expansions in NBPF19 (NOTCH2NLC) as the causative mutations for NIID. Further prompted by the similarities in the clinical and neuroimaging findings with NIID, we identified similar noncoding CGG repeat expansions in two other diseases: oculopharyngeal myopathy with leukoencephalopathy and oculopharyngodistal myopathy, in LOC642361/NUTM2B-AS1 and LRP12, respectively. These findings expand our knowledge of the clinical spectra of diseases caused by expansions of the same repeat motif, and further highlight how directly searching for expanded repeats can help identify mutations underlying diseases.


April 21, 2020

Long-read sequencing identifies GGC repeat expansions in NOTCH2NLC associated with neuronal intranuclear inclusion disease.

Neuronal intranuclear inclusion disease (NIID) is a progressive neurodegenerative disease that is characterized by eosinophilic hyaline intranuclear inclusions in neuronal and somatic cells. The wide range of clinical manifestations in NIID makes ante-mortem diagnosis difficult1-8, but skin biopsy enables its ante-mortem diagnosis9-12. The average onset age is 59.7 years among approximately 140 NIID cases consisting of mostly sporadic and several familial cases. By linkage mapping of a large NIID family with several affected members (Family 1), we identified a 58.1 Mb linked region at 1p22.1-q21.3 with a maximum logarithm of the odds score of 4.21. By long-read sequencing, we identified a GGC repeat expansion in the 5′ region of NOTCH2NLC (Notch 2 N-terminal like C) in all affected family members. Furthermore, we found similar expansions in 8 unrelated families with NIID and 40 sporadic NIID cases. We observed abnormal anti-sense transcripts in fibroblasts specifically from patients but not unaffected individuals. This work shows that repeat expansion in human-specific NOTCH2NLC, a gene that evolved by segmental duplication, causes a human disease.


April 21, 2020

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5?kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15?megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.