June 1, 2021  |  

Multiplex target enrichment using barcoded multi-kilobase fragments and probe-based capture technologies

Target enrichment capture methods allow scientists to rapidly interrogate important genomic regions of interest for variant discovery, including SNPs, gene isoforms, and structural variation. Custom targeted sequencing panels are important for characterizing heterogeneous, complex diseases and uncovering the genetic basis of inherited traits with more uniform coverage when compared to PCR-based strategies. With the increasing availability of high-quality reference genomes, customized gene panels are readily designed with high specificity to capture genomic regions of interest, thus enabling scientists to expand their research scope from a single individual to larger cohort studies or population-wide investigations. Coupled with PacBio® long-read sequencing, these technologies can capture 5 kb fragments of genomic DNA (gDNA), which are useful for interrogating intronic, exonic, and regulatory regions, characterizing complex structural variations, distinguishing between gene duplications and pseudogenes, and interpreting variant haplotyes. In addition, SMRT® Sequencing offers the lowest GC-bias and can sequence through repetitive regions. We demonstrate the additional insights possible by using in-depth long read capture sequencing for key immunology, drug metabolizing, and disease causing genes such as HLA, filaggrin, and cancer associated genes.

April 21, 2020  |  

The landscape of SNCA transcripts across synucleinopathies: New insights from long reads sequencing analysis

Dysregulation of alpha-synuclein expression has been implicated in the pathogenesis of synucleinopathies, in particular Parkinsontextquoterights Disease (PD) and Dementia with Lewy bodies (DLB). Previous studies have shown that the alternatively spliced isoforms of the SNCA gene are differentially expressed in different parts of the brain for PD and DLB patients. Similarly, SNCA isoforms with skipped exons can have a functional impact on the protein domains. The large intronic region of the SNCA gene was also shown to harbor structural variants that affect transcriptional levels. Here we apply the first study of using long read sequencing with targeted capture of both the gDNA and cDNA of the SNCA gene in brain tissues of PD, DLB, and control samples using the PacBio Sequel system. The targeted full-length cDNA (Iso-Seq) data confirmed complex usage of known alternative start sites and variable 3textquoteright UTR lengths, as well as novel 5textquoteright starts and 3textquoteright ends not previously described. The targeted gDNA data allowed phasing of up to 81% of the ~114kb SNCA region, with the longest phased block excedding 54 kb. We demonstrate that long gDNA and cDNA reads have the potential to reveal long-range information not previously accessible using traditional sequencing methods. This approach has a potential impact in studying disease risk genes such as SNCA, providing new insights into the genetic etiologies, including perturbations to the landscape the gene transcripts, of human complex diseases such as synucleinopathies.

April 21, 2020  |  

The replication-competent HIV-1 latent reservoir is primarily established near the time of therapy initiation.

Although antiretroviral therapy (ART) is highly effective at suppressing HIV-1 replication, the virus persists as a latent reservoir in resting CD4+ T cells during therapy. This reservoir forms even when ART is initiated early after infection, but the dynamics of its formation are largely unknown. The viral reservoirs of individuals who initiate ART during chronic infection are generally larger and genetically more diverse than those of individuals who initiate therapy during acute infection, consistent with the hypothesis that the reservoir is formed continuously throughout untreated infection. To determine when viruses enter the latent reservoir, we compared sequences of replication-competent viruses from resting peripheral CD4+ T cells from nine HIV-positive women on therapy to viral sequences circulating in blood collected longitudinally before therapy. We found that, on average, 71% of the unique viruses induced from the post-therapy latent reservoir were most genetically similar to viruses replicating just before ART initiation. This proportion is far greater than would be expected if the reservoir formed continuously and was always long lived. We conclude that ART alters the host environment in a way that allows the formation or stabilization of most of the long-lived latent HIV-1 reservoir, which points to new strategies targeted at limiting the formation of the reservoir around the time of therapy initiation.Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

April 21, 2020  |  

Antarctic blackfin icefish genome reveals adaptations to extreme environments.

Icefishes (suborder Notothenioidei; family Channichthyidae) are the only vertebrates that lack functional haemoglobin genes and red blood cells. Here, we report a high-quality genome assembly and linkage map for the Antarctic blackfin icefish Chaenocephalus aceratus, highlighting evolved genomic features for its unique physiology. Phylogenomic analysis revealed that Antarctic fish of the teleost suborder Notothenioidei, including icefishes, diverged from the stickleback lineage about 77 million years ago and subsequently evolved cold-adapted phenotypes as the Southern Ocean cooled to sub-zero temperatures. Our results show that genes involved in protection from ice damage, including genes encoding antifreeze glycoprotein and zona pellucida proteins, are highly expanded in the icefish genome. Furthermore, genes that encode enzymes that help to control cellular redox state, including members of the sod3 and nqo1 gene families, are expanded, probably as evolutionary adaptations to the relatively high concentration of oxygen dissolved in cold Antarctic waters. In contrast, some crucial regulators of circadian homeostasis (cry and per genes) are absent from the icefish genome, suggesting compromised control of biological rhythms in the polar light environment. The availability of the icefish genome sequence will accelerate our understanding of adaptation to extreme Antarctic environments.

April 21, 2020  |  

Long-read sequencing identifies GGC repeat expansions in NOTCH2NLC associated with neuronal intranuclear inclusion disease.

Neuronal intranuclear inclusion disease (NIID) is a progressive neurodegenerative disease that is characterized by eosinophilic hyaline intranuclear inclusions in neuronal and somatic cells. The wide range of clinical manifestations in NIID makes ante-mortem diagnosis difficult1-8, but skin biopsy enables its ante-mortem diagnosis9-12. The average onset age is 59.7 years among approximately 140 NIID cases consisting of mostly sporadic and several familial cases. By linkage mapping of a large NIID family with several affected members (Family 1), we identified a 58.1 Mb linked region at 1p22.1-q21.3 with a maximum logarithm of the odds score of 4.21. By long-read sequencing, we identified a GGC repeat expansion in the 5′ region of NOTCH2NLC (Notch 2 N-terminal like C) in all affected family members. Furthermore, we found similar expansions in 8 unrelated families with NIID and 40 sporadic NIID cases. We observed abnormal anti-sense transcripts in fibroblasts specifically from patients but not unaffected individuals. This work shows that repeat expansion in human-specific NOTCH2NLC, a gene that evolved by segmental duplication, causes a human disease.

April 21, 2020  |  

Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight.

The human genome contains “dark” gene regions that cannot be adequately assembled or aligned using standard short-read sequencing technologies, preventing researchers from identifying mutations within these gene regions that may be relevant to human disease. Here, we identify regions with few mappable reads that we call dark by depth, and others that have ambiguous alignment, called camouflaged. We assess how well long-read or linked-read technologies resolve these regions.Based on standard whole-genome Illumina sequencing data, we identify 36,794 dark regions in 6054 gene bodies from pathways important to human health, development, and reproduction. Of these gene bodies, 8.7% are completely dark and 35.2% are =?5% dark. We identify dark regions that are present in protein-coding exons across 748 genes. Linked-read or long-read sequencing technologies from 10x Genomics, PacBio, and Oxford Nanopore Technologies reduce dark protein-coding regions to approximately 50.5%, 35.6%, and 9.6%, respectively. We present an algorithm to resolve most camouflaged regions and apply it to the Alzheimer’s Disease Sequencing Project. We rescue a rare ten-nucleotide frameshift deletion in CR1, a top Alzheimer’s disease gene, found in disease cases but not in controls.While we could not formally assess the association of the CR1 frameshift mutation with Alzheimer’s disease due to insufficient sample-size, we believe it merits investigating in a larger cohort. There remain thousands of potentially important genomic regions overlooked by short-read sequencing that are largely resolved by long-read technologies.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.