June 1, 2021  |  

Toward comprehensive genomics analysis with de novo assembly.

Whole genome sequencing can provide comprehensive information important for determining the biochemical and genetic nature of all elements inside a genome. The high-quality genome references produced from past genome projects and advances in short-read sequencing technologies have enabled quick and cheap analysis for simple variants. However even with the focus on genome-wide resequencing for SNPs, the heritability of more than 50% of human diseases remains elusive. For non-human organisms, high-contiguity references are deficient, limiting the analysis of genomic features. The long and unbiased reads from single molecule, real-time (SMRT) Sequencing and new de novo assembly approaches have demonstrated the ability to detect more complicated variants and chromosome-level phasing. Moreover, with the recent advance of bioinformatics algorithms and tools, the computation tasks for completing high-quality de novo assembly of large genomes becomes feasible with commodity hardware. Ongoing development in sequencing technologies and bioinformatics will likely lead to routine generation of high-quality reference assemblies in the future. We discuss the current state of art and the challenges in bioinformatics toward such a goal. More specifically, explicit examples of pragmatic computational requirements for assembling mammalian-size genomes and algorithms suitable for processing diploid genomes are discussed.

June 1, 2021  |  

Screening and characterization of causative structural variants for bipolar disorder in a significantly linked chromosomal region onXq24-q27 in an extended pedigree from a genetic isolate

Bipolar disorder (BD) is a phenotypically and genetically complex and debilitating neurological disorder that affects 1% of the worldwide population. There is compelling evidence from family, twin and adoption studies supporting the involvement of a genetic predisposition in BD with estimated heritability up to ~ 80%. The risk in first-degree relatives is ten times higher than in the general population. Linkage and association studies have implicated multiple putative chromosomal loci for BP susceptibility, however no disease genes have been identified to date.

June 1, 2021  |  

Structural variant in the RNA Binding Motif Protein, X-Linked 2 (RBMX2) gene found to be linked to bipolar disorder

Bipolar disorder (BD) is a phenotypically and genetically complex neurological disorder that affects 1% of the worldwide population. There is compelling evidence from family, twin and adoption studies supporting the involvement of a genetic predisposition with estimated heritability up to ~ 80%. The risk in first-degree relatives is ten times higher than in the general population. Linkage and association studies have implicated multiple putative chromosomal loci for BD susceptibility, however no disease genes have yet to be identified. Here, we have fully characterized a ~12 Mb significantly linked (lod score=3.54) genomic region on chromosome Xq24-q27 in an extended family from a genetic isolate that was using long-read single molecule, real-time (SMRT) sequencing. The family segregates BD in at least 4 generations with 16 individuals out of 61 affected. Thus, this family portrays a highly elevated reoccurrence risk compared to the general population. It is expected that the genetic complexity would be reduced in isolated populations, even in genetically complex disorders such as BD, as in the case of this extended family. We selected 16 key individuals from the X-chromosomally linked family to be sequenced. These selected individuals either carried the disease haplotype, were non-carriers of the disease haplotype, or served as married-in controls. We designed a Nimblegen capture array enriching for 5-9 kb fragments spanning the entire 12 Mb region that were then sequenced using long-read SMRT sequencing to screen for causative structural variants (SVs) explaining the increased risk for BD in this extended family. Altogether, 192 SVs were detected in the critically linked region however most of these represented common variants that could be seen across many of the family members regardless of the disease status. One SV stood out that showed perfect segregation among all affected individuals that were carriers of the disease haplotype. This was a 330bp Alu deletion in intron 4 of the RNA Binding Motif Protein, X-Linked 2 (RBMX2) gene that has previously been shown to play a central role in brain development and function. Moreover, Alu elements in general have also previously been associated with at least 37 neurological and neurodegenerative disorders. In order to validate the finding and the functionality of the identified SV further studies like isoform characterization are warranted.

February 5, 2021  |  

AGBT PacBio Workshop: Full workshop recording

PacBio customers and thought leaders discuss the role SMRT sequencing is playing in comprehensive genomics: past, present, and future. Featuring J. Craig Venter, Gene Myers, Deanna Church, Jeong-Sun Seo and…

April 21, 2020  |  

The Chinese chestnut genome: a reference for species restoration

Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.

April 21, 2020  |  

Uncovering Missing Heritability in Rare Diseases.

The problem of ‘missing heritability’ affects both common and rare diseases hindering: discovery, diagnosis, and patient care. The ‘missing heritability’ concept has been mainly associated with common and complex diseases where promising modern technological advances, like genome-wide association studies (GWAS), were unable to uncover the complete genetic mechanism of the disease/trait. Although rare diseases (RDs) have low prevalence individually, collectively they are common. Furthermore, multi-level genetic and phenotypic complexity when combined with the individual rarity of these conditions poses an important challenge in the quest to identify causative genetic changes in RD patients. In recent years, high throughput sequencing has accelerated discovery and diagnosis in RDs. However, despite the several-fold increase (from ~10% using traditional to ~40% using genome-wide genetic testing) in finding genetic causes of these diseases in RD patients, as is the case in common diseases-the majority of RDs are also facing the ‘missing heritability’ problem. This review outlines the key role of high throughput sequencing in uncovering genetics behind RDs, with a particular focus on genome sequencing. We review current advances and challenges of sequencing technologies, bioinformatics approaches, and resources.

April 21, 2020  |  

The comparative genomics and complex population history of Papio baboons.

Recent studies suggest that closely related species can accumulate substantial genetic and phenotypic differences despite ongoing gene flow, thus challenging traditional ideas regarding the genetics of speciation. Baboons (genus Papio) are Old World monkeys consisting of six readily distinguishable species. Baboon species hybridize in the wild, and prior data imply a complex history of differentiation and introgression. We produced a reference genome assembly for the olive baboon (Papio anubis) and whole-genome sequence data for all six extant species. We document multiple episodes of admixture and introgression during the radiation of Papio baboons, thus demonstrating their value as a model of complex evolutionary divergence, hybridization, and reticulation. These results help inform our understanding of similar cases, including modern humans, Neanderthals, Denisovans, and other ancient hominins.

April 21, 2020  |  

Profiling the genome-wide landscape of tandem repeat expansions.

Tandem repeat (TR) expansions have been implicated in dozens of genetic diseases, including Huntington’s Disease, Fragile X Syndrome, and hereditary ataxias. Furthermore, TRs have recently been implicated in a range of complex traits, including gene expression and cancer risk. While the human genome harbors hundreds of thousands of TRs, analysis of TR expansions has been mainly limited to known pathogenic loci. A major challenge is that expanded repeats are beyond the read length of most next-generation sequencing (NGS) datasets and are not profiled by existing genome-wide tools. We present GangSTR, a novel algorithm for genome-wide genotyping of both short and expanded TRs. GangSTR extracts information from paired-end reads into a unified model to estimate maximum likelihood TR lengths. We validate GangSTR on real and simulated data and show that GangSTR outperforms alternative methods in both accuracy and speed. We apply GangSTR to a deeply sequenced trio to profile the landscape of TR expansions in a healthy family and validate novel expansions using orthogonal technologies. Our analysis reveals that healthy individuals harbor dozens of long TR alleles not captured by current genome-wide methods. GangSTR will likely enable discovery of novel disease-associated variants not currently accessible from NGS. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

April 21, 2020  |  

Genetic Variation, Comparative Genomics, and the Diagnosis of Disease.

The discovery of mutations associated with human genetic dis- ease is an exercise in comparative genomics (see Glossary). Although there are many different strategies and approaches, the central premise is that affected persons harbor a significant excess of pathogenic DNA variants as com- pared with a group of unaffected persons (controls) that is either clinically defined1 or established by surveying large swaths of the general population.2 The more exclu- sive the variant is to the disease, the greater its penetrance, the larger its effect size, and the more relevant it becomes to both disease diagnosis and future therapeutic investigation. The most popular approach used by researchers in human genetics is the case–control design, but there are others that can be used to track variants and disease in a family context or that consider the probability of different classes of mutations based on evolutionary patterns of divergence or de novo mutational change.3,4 Although the approaches may be straightforward, the discovery of patho- genic variation and its mechanism of action often is less trivial, and decades of research can be required in order to identify the variants underlying both mendelian and complex genetic traits.

April 21, 2020  |  

An Annotated Genome for Haliotis rufescens (Red Abalone) and Resequenced Green, Pink, Pinto, Black, and White Abalone Species.

Abalone are one of the few marine taxa where aquaculture production dominates the global market as a result of increasing demand and declining natural stocks from overexploitation and disease. To better understand abalone biology, aid in conservation efforts for endangered abalone species, and gain insight into sustainable aquaculture, we created a draft genome of the red abalone (Haliotis rufescens). The approach to this genome draft included initial assembly using raw Illumina and PacBio sequencing data with MaSuRCA, before scaffolding using sequencing data generated from Chicago library preparations with HiRise2. This assembly approach resulted in 8,371 scaffolds and total length of 1.498?Gb; the N50 was 1.895?Mb, and the longest scaffold was 13.2?Mb. Gene models were predicted, using MAKER2, from RNA-Seq data and all related expressed sequence tags and proteins from NCBI; this resulted in 57,785 genes with an average length of 8,255?bp. In addition, single nucleotide polymorphisms were called on Illumina short-sequencing reads from five other eastern Pacific abalone species: the green (H. fulgens), pink (H. corrugata), pinto (H. kamtschatkana), black (H. cracherodii), and white (H. sorenseni) abalone. Phylogenetic relationships largely follow patterns detected by previous studies based on 1,784,991 high-quality single nucleotide polymorphisms. Among the six abalone species examined, the endangered white abalone appears to harbor the lowest levels of heterozygosity. This draft genome assembly and the sequencing data provide a foundation for genome-enabled aquaculture improvement for red abalone, and for genome-guided conservation efforts for the other five species and, in particular, for the endangered white and black abalone.

April 21, 2020  |  

Ancestral Admixture Is the Main Determinant of Global Biodiversity in Fission Yeast.

Mutation and recombination are key evolutionary processes governing phenotypic variation and reproductive isolation. We here demonstrate that biodiversity within all globally known strains of Schizosaccharomyces pombe arose through admixture between two divergent ancestral lineages. Initial hybridization was inferred to have occurred ~20-60 sexual outcrossing generations ago consistent with recent, human-induced migration at the onset of intensified transcontinental trade. Species-wide heritable phenotypic variation was explained near-exclusively by strain-specific arrangements of alternating ancestry components with evidence for transgressive segregation. Reproductive compatibility between strains was likewise predicted by the degree of shared ancestry. To assess the genetic determinants of ancestry block distribution across the genome, we characterized the type, frequency, and position of structural genomic variation using nanopore and single-molecule real-time sequencing. Despite being associated with double-strand break initiation points, over 800 segregating structural variants exerted overall little influence on the introgression landscape or on reproductive compatibility between strains. In contrast, we found strong ancestry disequilibrium consistent with negative epistatic selection shaping genomic ancestry combinations during the course of hybridization. This study provides a detailed, experimentally tractable example that genomes of natural populations are mosaics reflecting different evolutionary histories. Exploiting genome-wide heterogeneity in the history of ancestral recombination and lineage-specific mutations sheds new light on the population history of S. pombe and highlights the importance of hybridization as a creative force in generating biodiversity. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

April 21, 2020  |  

Multi-platform discovery of haplotype-resolved structural variation in human genomes.

The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50?bp) and 27,622 SVs (=50?bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.

April 21, 2020  |  

Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight.

The human genome contains “dark” gene regions that cannot be adequately assembled or aligned using standard short-read sequencing technologies, preventing researchers from identifying mutations within these gene regions that may be relevant to human disease. Here, we identify regions with few mappable reads that we call dark by depth, and others that have ambiguous alignment, called camouflaged. We assess how well long-read or linked-read technologies resolve these regions.Based on standard whole-genome Illumina sequencing data, we identify 36,794 dark regions in 6054 gene bodies from pathways important to human health, development, and reproduction. Of these gene bodies, 8.7% are completely dark and 35.2% are =?5% dark. We identify dark regions that are present in protein-coding exons across 748 genes. Linked-read or long-read sequencing technologies from 10x Genomics, PacBio, and Oxford Nanopore Technologies reduce dark protein-coding regions to approximately 50.5%, 35.6%, and 9.6%, respectively. We present an algorithm to resolve most camouflaged regions and apply it to the Alzheimer’s Disease Sequencing Project. We rescue a rare ten-nucleotide frameshift deletion in CR1, a top Alzheimer’s disease gene, found in disease cases but not in controls.While we could not formally assess the association of the CR1 frameshift mutation with Alzheimer’s disease due to insufficient sample-size, we believe it merits investigating in a larger cohort. There remain thousands of potentially important genomic regions overlooked by short-read sequencing that are largely resolved by long-read technologies.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.