June 1, 2021  |  

Enrichment of unamplified DNA and long-read SMRT Sequencing to unlock repeat expansion disorders

Nucleotide repeat expansions are a major cause of neurological and neuromuscular disease in humans, however, the nature of these genomic regions makes characterizing them extremely challenging. Accurate DNA sequencing of repeat expansions using short-read sequencing technologies is difficult, as short-read technologies often cannot read through regions of low sequence complexity. Additionally, these short reads do not span the entire region of interest and therefore sequence assembly is required. Lastly, most target enrichment methods are reliant upon amplification which adds the additional caveat of PCR bias. We have developed a novel, amplification-free enrichment technique that employs the CRISPR/Cas9 system for specific targeting of individual human genes. This method, in conjunction with PacBio’s long reads and uniform coverage, enables sequencing of complex genomic regions that cannot be investigated with other technologies. Using human genomic DNA samples and this strategy, we have successfully targeted the loci of Huntington’s Disease (HTT; CAG repeat), Fragile X (FMR1; CGG repeat), ALS (C9orf72; GGGGCC repeat), and Spinocerebellar ataxia type 10 (SCA10; variable ATTCT repeat) for examination. With this data, we demonstrate the ability to isolate hundreds of individual on-target molecules in a single SMRT Cell and accurately sequence through long repeat stretches, regardless of the extreme GC-content. The method is compatible with multiplexing of multiple targets and multiple samples in a single reaction. This technique also captures native DNA molecules for sequencing, allowing for the possibility of direct detection and characterization of epigenetic signatures.


June 1, 2021  |  

Targeted SMRT Sequencing of difficult regions of the genome using a Cas9, non-amplification based method

Targeted sequencing has proven to be an economical means of obtaining sequence information for one or more defined regions of a larger genome. However, most target enrichment methods are reliant upon some form of amplification. Amplification removes the epigenetic marks present in native DNA, and some genomic regions, such as those with extreme GC content and repetitive sequences, are recalcitrant to faithful amplification. Yet, a large number of genetic disorders are caused by expansions of repeat sequences. Furthermore, for some disorders, methylation status has been shown to be a key factor in the mechanism of disease. We have developed a novel, amplification-free enrichment technique that employs the CRISPR/Cas9 system for specific targeting of individual human genes. This method, in conjunction with SMRT Sequencing’s long reads, high consensus accuracy, and uniform coverage, allows the sequencing of complex genomic regions that cannot be investigated with other technologies.


June 1, 2021  |  

Screening for causative structural variants in neurological disorders using long-read sequencing

Over the past decades neurological disorders have been extensively studied producing a large number of candidate genomic regions and candidate genes. The SNPs identified in these studies rarely represent the true disease-related functional variants. However, more recently a shift in focus from SNPs to larger structural variants has yielded breakthroughs in our understanding of neurological disorders.Here we have developed candidate gene screening methods that combine enrichment of long DNA fragments with long-read sequencing that is optimized for structural variation discovery. We have also developed a novel, amplification-free enrichment technique using the CRISPR/Cas9 system to target genomic regions.We sequenced gDNA and full-length cDNA extracted from the temporal lobe for two Alzheimer’s patients for 35 GWAS candidate genes. The multi-kilobase long reads allowed for phasing across the genes and detection of a broad range of genomic variants including SNPs to multi-kilobase insertions, deletions and inversions. In the full-length cDNA data we detected differential allelic isoform complexity, novel exons as well as transcript isoforms. By combining the gDNA data with full-length isoform characterization allows to build a more comprehensive view of the underlying biological disease mechanisms in Alzheimer’s disease. Using the novel PCR-free CRISPR-Cas9 enrichment method we screened several genes including the hexanucleotide repeat expansion C9ORF72 that is associated with 40% of familiar ALS cases. This method excludes any PCR bias or errors from an otherwise hard to amplify region as well as preserves the basemodication in a single molecule fashion which allows you to capture mosaicism present in the sample.


June 1, 2021  |  

Targeted enrichment without amplification and SMRT Sequencing of repeat-expansion disease causative genomic regions

Targeted sequencing has proven to be an economical means of obtaining sequence information for one or more defined regions of a larger genome. However, most target enrichment methods are reliant upon some form of amplification. Amplification removes the epigenetic marks present in native DNA, and some genomic regions, such as those with extreme GC content and repetitive sequences, are recalcitrant to faithful amplification. Yet, a large number of genetic disorders are caused by expansions of repeat sequences. Furthermore, for some disorders, methylation status has been shown to be a key factor in the mechanism of disease. We have developed a novel, amplification-free enrichment technique that employs the CRISPR/Cas9 system for specific targeting of individual human genes. This method, in conjunction with SMRT Sequencing’s long reads, high consensus accuracy, and uniform coverage, allows the sequencing of complex genomic regions that cannot be investigated with other technologies. Using human genomic DNA samples and this strategy, we have successfully targeted the loci of a number of repeat expansion disorders (HTT, FMR1, ATXN10, C9orf72). With this data, we demonstrate the ability to isolate hundreds of individual on-target molecules and accurately sequence through long repeat stretches, regardless of the extreme GC-content, followed by accurate sequencing on a single PacBio RS II SMRT Cell or Sequel SMRT Cell 1M. The method is compatible with multiplexing of multiple targets and multiple samples in a single reaction. Furthermore, this technique also preserves native DNA molecules for sequencing, allowing for the possibility of direct detection and characterization of epigenetic signatures. We demonstrate detection of 5-mC in human promoter sequences and CpG islands.


June 1, 2021  |  

Amplification-free protocol for targeted enrichment of repeat expansion genomic regions and SMRT Sequencing

Many genetic disorders are associated with repeat sequence expansions. Obtaining accurate DNA sequence information from these regions will facilitate researchers to further establish the relationship between these genetic disorders and underlying disease mechanisms. Moreover, repeat interruptions have also been shown to act as phenotypic modifiers in some disorders. Targeted sequencing is an economical way to obtain sequence information from one or more defined regions in a genome. However, most targeted enrichment and sequencing methods require some form of DNA amplification. Amplifying large regions with extreme GC content as seen in repeat expansion disorders is challenging and prone to introducing sequence artifacts. DNA amplification also removes any epigenetic signatures present in native DNA. This technique also preserves native DNA molecules for the possibility of direct characterization of epigenetic signatures.


April 21, 2020  |  

Long-read sequencing for rare human genetic diseases.

During the past decade, the search for pathogenic mutations in rare human genetic diseases has involved huge efforts to sequence coding regions, or the entire genome, using massively parallel short-read sequencers. However, the approximate current diagnostic rate is <50% using these approaches, and there remain many rare genetic diseases with unknown cause. There may be many reasons for this, but one plausible explanation is that the responsible mutations are in regions of the genome that are difficult to sequence using conventional technologies (e.g., tandem-repeat expansion or complex chromosomal structural aberrations). Despite the drawbacks of high cost and a shortage of standard analytical methods, several studies have analyzed pathogenic changes in the genome using long-read sequencers. The results of these studies provide hope that further application of long-read sequencers to identify the causative mutations in unsolved genetic diseases may expand our understanding of the human genome and diseases. Such approaches may also be applied to molecular diagnosis and therapeutic strategies for patients with genetic diseases in the future.


April 21, 2020  |  

Profiling the genome-wide landscape of tandem repeat expansions.

Tandem repeat (TR) expansions have been implicated in dozens of genetic diseases, including Huntington’s Disease, Fragile X Syndrome, and hereditary ataxias. Furthermore, TRs have recently been implicated in a range of complex traits, including gene expression and cancer risk. While the human genome harbors hundreds of thousands of TRs, analysis of TR expansions has been mainly limited to known pathogenic loci. A major challenge is that expanded repeats are beyond the read length of most next-generation sequencing (NGS) datasets and are not profiled by existing genome-wide tools. We present GangSTR, a novel algorithm for genome-wide genotyping of both short and expanded TRs. GangSTR extracts information from paired-end reads into a unified model to estimate maximum likelihood TR lengths. We validate GangSTR on real and simulated data and show that GangSTR outperforms alternative methods in both accuracy and speed. We apply GangSTR to a deeply sequenced trio to profile the landscape of TR expansions in a healthy family and validate novel expansions using orthogonal technologies. Our analysis reveals that healthy individuals harbor dozens of long TR alleles not captured by current genome-wide methods. GangSTR will likely enable discovery of novel disease-associated variants not currently accessible from NGS. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.


April 21, 2020  |  

Single-Molecule Sequencing: Towards Clinical Applications.

In the past several years, single-molecule sequencing platforms, such as those by Pacific Biosciences and Oxford Nanopore Technologies, have become available to researchers and are currently being tested for clinical applications. They offer exceptionally long reads that permit direct sequencing through regions of the genome inaccessible or difficult to analyze by short-read platforms. This includes disease-causing long repetitive elements, extreme GC content regions, and complex gene loci. Similarly, these platforms enable structural variation characterization at previously unparalleled resolution and direct detection of epigenetic marks in native DNA. Here, we review how these technologies are opening up new clinical avenues that are being applied to pathogenic microorganisms and viruses, constitutional disorders, pharmacogenomics, cancer, and more.Copyright © 2018 Elsevier Ltd. All rights reserved.


April 21, 2020  |  

Long-read sequencing identified intronic repeat expansions in SAMD12 from Chinese pedigrees affected with familial cortical myoclonic tremor with epilepsy.

The locus for familial cortical myoclonic tremor with epilepsy (FCMTE) has long been mapped to 8q24 in linkage studies, but the causative mutations remain unclear. Recently, expansions of intronic TTTCA and TTTTA repeat motifs within SAMD12 were found to be involved in the pathogenesis of FCMTE in Japanese pedigrees. We aim to identify the causative mutations of FCMTE in Chinese pedigrees.We performed genetic linkage analysis by microsatellite markers in a five-generation Chinese pedigree with 55 members. We also used array-comparative genomic hybridisation (CGH) and next-generation sequencing (NGS) technologies (whole-exome sequencing, capture region deep sequencing and whole-genome sequencing) to identify the causative mutations in the disease locus. Recently, we used low-coverage (~10×) long-read genome sequencing (LRS) on the PacBio Sequel and Oxford Nanopore platforms to identify the causative mutations, and used repeat-primed PCR for validation of the repeat expansions.Linkage analysis mapped the disease locus to 8q23.3-24.23. Array-CGH and NGS failed to identify causative mutations in this locus. LRS identified the intronic TTTCA and TTTTA repeat expansions in SAMD12 as the causative mutations, thus corroborating the recently published results in Japanese pedigrees.We identified the pentanucleotide repeat expansion in SAMD12 as the causative mutation in Chinese FCMTE pedigrees. Our study also suggested that LRS is an effective tool for molecular diagnosis of genetic disorders, especially for neurological diseases that cannot be positively diagnosed by conventional clinical microarray and NGS technologies. © Author(s) (or their employer(s)) 2019. No commercial re-use. See rights and permissions. Published by BMJ.


April 21, 2020  |  

Detecting a long insertion variant in SAMD12 by SMRT sequencing: implications of long-read whole-genome sequencing for repeat expansion diseases.

Long-read sequencing technology is now capable of reading single-molecule DNA with an average read length of more than 10?kb, fully enabling the coverage of large structural variations (SVs). This advantage may pave the way for the detection of unprecedented SVs as well as repeat expansions. Pathogenic SVs of only known genes used to be selectively analyzed based on prior knowledge of target DNA sequence. The unbiased application of long-read whole-genome sequencing (WGS) for the detection of pathogenic SVs has just begun. Here, we apply PacBio SMRT sequencing in a Japanese family with benign adult familial myoclonus epilepsy (BAFME). Our SV selection of low-coverage WGS data (7×) narrowed down the candidates to only six SVs in a 7.16-Mb region of the BAFME1 locus and correctly determined an approximately 4.6-kb SAMD12 intronic repeat insertion, which is causal of BAFME1. These results indicate that long-read WGS is potentially useful for evaluating all of the known SVs in a genome and identifying new disease-causing SVs in combination with other genetic methods to resolve the genetic causes of currently unexplained diseases.


April 21, 2020  |  

Noncoding CGG repeat expansions in neuronal intranuclear inclusion disease, oculopharyngodistal myopathy and an overlapping disease.

Noncoding repeat expansions cause various neuromuscular diseases, including myotonic dystrophies, fragile X tremor/ataxia syndrome, some spinocerebellar ataxias, amyotrophic lateral sclerosis and benign adult familial myoclonic epilepsies. Inspired by the striking similarities in the clinical and neuroimaging findings between neuronal intranuclear inclusion disease (NIID) and fragile X tremor/ataxia syndrome caused by noncoding CGG repeat expansions in FMR1, we directly searched for repeat expansion mutations and identified noncoding CGG repeat expansions in NBPF19 (NOTCH2NLC) as the causative mutations for NIID. Further prompted by the similarities in the clinical and neuroimaging findings with NIID, we identified similar noncoding CGG repeat expansions in two other diseases: oculopharyngeal myopathy with leukoencephalopathy and oculopharyngodistal myopathy, in LOC642361/NUTM2B-AS1 and LRP12, respectively. These findings expand our knowledge of the clinical spectra of diseases caused by expansions of the same repeat motif, and further highlight how directly searching for expanded repeats can help identify mutations underlying diseases.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.