Menu
July 19, 2019

Amplification-free, CRISPR-Cas9 targeted enrichment and SMRT Sequencing of repeat-expansion disease causative genomic regions

Targeted sequencing has proven to be an economical means of obtaining sequence information for one or more defined regions of a larger genome. However, most target enrichment methods require amplification. Some genomic regions, such as those with extreme GC content and repetitive sequences, are recalcitrant to faithful amplification. Yet, many human genetic disorders are caused by repeat expansions, including difficult to sequence tandem repeats. We have developed a novel, amplification-free enrichment technique that employs the CRISPR-Cas9 system for specific targeting multiple genomic loci. This method, in conjunction with long reads generated through Single Molecule, Real-Time (SMRT) sequencing and unbiased coverage, enables enrichment and sequencing of complex genomic regions that cannot be investigated with other technologies. Using human genomic DNA samples, we demonstrate successful targeting of causative loci for Huntingtontextquoterights disease (HTT; CAG repeat), Fragile X syndrome (FMR1; CGG repeat), amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (C9orf72; GGGGCC repeat), and spinocerebellar ataxia type 10 (SCA10) (ATXN10; variable ATTCT repeat). The method, amenable to multiplexing across multiple genomic loci, uses an amplification-free approach that facilitates the isolation of hundreds of individual on-target molecules in a single SMRT Cell and accurate sequencing through long repeat stretches, regardless of extreme GC percent or sequence complexity content. Our novel targeted sequencing method opens new doors to genomic analyses independent of PCR amplification that will facilitate the study of repeat expansion disorders.


July 19, 2019

De novo assembly of genomes from long sequence reads reveals uncharted territories of Propionibacterium freudenreichii.

Propionibacterium freudenreichii is an industrially important bacterium granted the Generally Recognized as Safe (the GRAS) status, due to its long safe use in food bioprocesses. Despite the recognized role in the food industry and in the production of vitamin B12, as well as its documented health-promoting potential, P. freudenreichii remained poorly characterised at the genomic level. At present, only three complete genome sequences are available for the species.We used the PacBio RS II sequencing platform to generate complete genomes of 20 P. freudenreichii strains and compared them in detail. Comparative analyses revealed both sequence conservation and genome organisational diversity among the strains. Assembly from long reads resulted in the discovery of additional circular elements: two putative conjugative plasmids and three active, lysogenic bacteriophages. It also permitted characterisation of the CRISPR-Cas systems. The use of the PacBio sequencing platform allowed identification of DNA modifications, which in turn allowed characterisation of the restriction-modification systems together with their recognition motifs. The observed genomic differences suggested strain variation in surface piliation and specific mucus binding, which were validated by experimental studies. The phenotypic characterisation displayed large diversity between the strains in ability to utilise a range of carbohydrates, to grow at unfavourable conditions and to form a biofilm.The complete genome sequencing allowed detailed characterisation of the industrially important species, P. freudenreichii by facilitating the discovery of previously unknown features. The results presented here lay a solid foundation for future genetic and functional genomic investigations of this actinobacterial species.


July 19, 2019

A comparative study on the characterization of hepatitis B virus quasispecies by clone-based sequencing and third-generation sequencing.

Hepatitis B virus (HBV) has a high mutation rate due to the extremely high replication rate and the proofreading deficiency during reverse transcription. The generated variants with genetic heterogeneity are described as viral quasispecies (QS). Clone-based sequencing (CBS) is thought to be the ‘gold standard’ for assessing QS complexity and diversity of HBV, but an important issue about CBS is cost-effectiveness and laborious. In this study, we investigated the utility of the third-generation sequencing (TGS) DNA sequencing to characterize genetic heterogeneity of HBV QS and assessed the possible contribution of TGS technology in HBV QS studies. Parallel experiments including 3 control samples, which consisted of HBV full gene genotype B and genotype C plasmids, and 10 patients samples were performed by using CBS and TGS to analyze HBV whole-genome QS. Characterization of QS heterogeneity was conducted by using comprehensive statistical analysis. The results showed that TGS had a high consistency with CBS when measuring the complexity and diversity of QS. In addition, to detect rare variants, there were strong advantages conferred by TGS. In summary, TGS was considered to be practicable in HBV QS studies and it might have a relevant role in the clinical management of HBV infection in the future.


July 19, 2019

Single-molecule sequencing reveals the chromosome-scale genomic architecture of the nematode model organism Pristionchus pacificus.

The nematode Pristionchus pacificus is an established model for integrative evolutionary biology and comparative studies with Caenorhabditis elegans. While an existing genome draft facilitated the identification of several genes controlling various developmental processes, its high degree of fragmentation complicated virtually all genomic analyses. Here, we present a de novo genome assembly from single-molecule, long-read sequencing data consisting of 135 P. pacificus contigs. When combined with a genetic linkage map, 99% of the assembly could be ordered and oriented into six chromosomes. This allowed us to robustly characterize chromosomal patterns of gene density, repeat content, nucleotide diversity, linkage disequilibrium, and macrosynteny in P. pacificus. Despite widespread conservation of synteny between P. pacificus and C. elegans, we identified one major translocation from an autosome to the sex chromosome in the lineage leading to C. elegans. This highlights the potential of the chromosome-scale assembly for future genomic studies of P. pacificus. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.


July 19, 2019

A high-throughput approach for identification of nontuberculous mycobacteria in drinking water reveals relationship between water age and Mycobacterium avium.

Nontuberculous mycobacteria (NTM) frequently detected in drinking water (DW) include species associated with human infections, as well as species rarely linked to disease. Methods for improved the recovery of NTM DNA and high-throughput identification of NTM are needed for risk assessment of NTM infection through DW exposure. In this study, different methods of recovering bacterial DNA from DW were compared, revealing that a phenol-chloroform DNA extraction method yielded two to four times as much total DNA and eight times as much NTM DNA as two commercial DNA extraction kits. This method, combined with high-throughput, single-molecule real-time sequencing of NTMrpoBgenes, allowed the identification of NTM to the species, subspecies, and (in some cases) strain levels. This approach was applied to DW samples collected from 15 households serviced by a chloraminated distribution system, with homes located in areas representing short (<24 h) and long (>24 h) distribution system residence times. Multivariate statistical analysis revealed that greater water age (i.e., combined distribution system residence time and home plumbing stagnation time) was associated with a greater relative abundance ofMycobacterium aviumsubsp.avium, one of the most prevalent NTM causing infections in humans. DW from homes closer to the treatment plant (with a shorter water age) contained more diverse NTM species, includingMycobacterium abscessusandMycobacterium chelonaeOverall, our approach allows NTM identification to the species and subspecies levels and can be used in future studies to assess the risk of waterborne infection by providing insight into the similarity between environmental and infection-associated NTM.IMPORTANCEAn extraction method for improved recovery of DNA from nontuberculous mycobacteria (NTM), combined with single-molecule real-time sequencing (PacBio) of NTMrpoBgenes, was used for high-throughput characterization of NTM species and in some cases strains in drinking water (DW). The extraction procedure recovered, on average, eight times as much NTM DNA and three times as much total DNA from DW as two widely used commercial DNA extraction kits. The combined DNA extraction and sequencing approach allowed high-throughput screening of DW samples to identify NTM, revealing that the relative abundance ofMycobacterium aviumsubsp.aviumincreased with water age. Furthermore, the two-step barcoding approach developed as part of the PacBio sequencing method makes this procedure highly adaptable, allowing it to be used for other target genes and species. Copyright © 2018 Haig et al.


July 19, 2019

Dissecting the causal mechanism of X-linked Dystonia-Parkinsonism by integrating genome and transcriptome assembly.

X-linked Dystonia-Parkinsonism (XDP) is a Mendelian neurodegenerative disease that is endemic to the Philippines and is associated with a founder haplotype. We integrated multiple genome and transcriptome assembly technologies to narrow the causal mutation to the TAF1 locus, which included a SINE-VNTR-Alu (SVA) retrotransposition into intron 32 of the gene. Transcriptome analyses identified decreased expression of the canonical cTAF1 transcript among XDP probands, and de novo assembly across multiple pluripotent stem-cell-derived neuronal lineages discovered aberrant TAF1 transcription that involved alternative splicing and intron retention (IR) in proximity to the SVA that was anti-correlated with overall TAF1 expression. CRISPR/Cas9 excision of the SVA rescued this XDP-specific transcriptional signature and normalized TAF1 expression in probands. These data suggest an SVA-mediated aberrant transcriptional mechanism associated with XDP and may provide a roadmap for layered technologies and integrated assembly-based analyses for other unsolved Mendelian disorders. Copyright © 2018 Elsevier Inc. All rights reserved.


July 19, 2019

Piercing the dark matter: bioinformatics of long-range sequencing and mapping.

Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.


July 19, 2019

Accurate detection of complex structural variations using single-molecule sequencing.

Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr ) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles ) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.


July 19, 2019

Genome sequence of the progenitor of wheat A subgenome Triticum urartu.

Triticum urartu (diploid, AA) is the progenitor of the A subgenome of tetraploid (Triticum turgidum, AABB) and hexaploid (Triticum aestivum, AABBDD) wheat1,2. Genomic studies of T. urartu have been useful for investigating the structure, function and evolution of polyploid wheat genomes. Here we report the generation of a high-quality genome sequence of T. urartu by combining bacterial artificial chromosome (BAC)-by-BAC sequencing, single molecule real-time whole-genome shotgun sequencing 3 , linked reads and optical mapping4,5. We assembled seven chromosome-scale pseudomolecules and identified protein-coding genes, and we suggest a model for the evolution of T. urartu chromosomes. Comparative analyses with genomes of other grasses showed gene loss and amplification in the numbers of transposable elements in the T. urartu genome. Population genomics analysis of 147 T. urartu accessions from across the Fertile Crescent showed clustering of three groups, with differences in altitude and biostress, such as powdery mildew disease. The T. urartu genome assembly provides a valuable resource for studying genetic variation in wheat and related grasses, and promises to facilitate the discovery of genes that could be useful for wheat improvement.


July 19, 2019

Genomic variation in 3,010 diverse accessions of Asian cultivated rice.

Here we analyse genetic variation, population structure and diversity among 3,010 diverse Asian cultivated rice (Oryza sativa L.) genomes from the 3,000 Rice Genomes Project. Our results are consistent with the five major groups previously recognized, but also suggest several unreported subpopulations that correlate with geographic location. We identified 29 million single nucleotide polymorphisms, 2.4 million small indels and over 90,000 structural variations that contribute to within- and between-population variation. Using pan-genome analyses, we identified more than 10,000 novel full-length protein-coding genes and a high number of presence-absence variations. The complex patterns of introgression observed in domestication genes are consistent with multiple independent rice domestication events. The public availability of data from the 3,000 Rice Genomes Project provides a resource for rice genomics research and breeding.


July 19, 2019

High-quality genome assemblies reveal long non-coding RNAs expressed in ant brains.

Ants are an emerging model system for neuroepigenetics, as embryos with virtually identical genomes develop into different adult castes that display diverse physiology, morphology, and behavior. Although a number of ant genomes have been sequenced to date, their draft quality is an obstacle to sophisticated analyses of epigenetic gene regulation. We reassembled de novo high-quality genomes for two ant species, Camponotus floridanus and Harpegnathos saltator. Using long reads enabled us to span large repetitive regions and improve genome contiguity, leading to comprehensive and accurate protein-coding annotations that facilitated the identification of a Gp-9-like gene as differentially expressed in Harpegnathos castes. The new assemblies also enabled us to annotate long non-coding RNAs in ants, revealing caste-, brain-, and developmental-stage-specific long non-coding RNAs (lncRNAs) in Harpegnathos. These upgraded genomes, along with the new gene annotations, will aid future efforts to identify epigenetic mechanisms of phenotypic and behavioral plasticity in ants. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.


July 19, 2019

From short reads to chromosome-scale genome assemblies.

A high-quality, annotated genome assembly is the foundation for many downstream studies. However, obtaining such an assembly is a complex, reiterative process that requires the assimilation of high-quality data and combines different approaches and data types. While some software packages incorporating multiple steps of genome assembly are commercially available, they may not be flexible enough to be routinely applied to all organisms, particularly to nonmodel species such as pathogenic oomycetes and fungi. If researchers understand and apply the most appropriate, currently available tools for each step, it is possible to customize parameters and optimize results for their organism of study. Based on our experience of de novo assembly and annotation of several oomycete species, this chapter provides a modular workflow from processing of raw reads, to initial assembly generation, through optimization, chromosome-scale scaffolding and annotation, outlining input and output data as well as examples and alternative software used for each step. The accompanying Notes provide background information for each step as well as alternative options. The final result of this workflow could be an annotated, high-quality, validated, chromosome-scale assembly or a draft assembly of sufficient quality to meet specific needs of a project.


July 19, 2019

Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L.

Modern sugarcanes are polyploid interspecific hybrids, combining high sugar content from Saccharum officinarum with hardiness, disease resistance and ratooning of Saccharum spontaneum. Sequencing of a haploid S. spontaneum, AP85-441, facilitated the assembly of 32 pseudo-chromosomes comprising 8 homologous groups of 4 members each, bearing 35,525 genes with alleles defined. The reduction of basic chromosome number from 10 to 8 in S. spontaneum was caused by fissions of 2 ancestral chromosomes followed by translocations to 4 chromosomes. Surprisingly, 80% of nucleotide binding site-encoding genes associated with disease resistance are located in 4 rearranged chromosomes and 51% of those in rearranged regions. Resequencing of 64 S. spontaneum genomes identified balancing selection in rearranged regions, maintaining their diversity. Introgressed S. spontaneum chromosomes in modern sugarcanes are randomly distributed in AP85-441 genome, indicating random recombination among homologs in different S. spontaneum accessions. The allele-defined Saccharum genome offers new knowledge and resources to accelerate sugarcane improvement.


July 19, 2019

Prediction of smoking by multiplex bisulfite PCR with long amplicons considering allele-specific effects on DNA methylation.

Methylation of DNA is associated with a variety of biological processes. With whole-genome studies of DNA methylation, it became possible to determine a set of genomic sites where DNA methylation is associated with a specific phenotype. A method is needed that allows detailed follow-up studies of the sites, including taking into account genetic information. Bisulfite PCR is a natural choice for this kind of task, but multiplexing is one of the most important problems impeding its implementation. To address this task, we took advantage of a recently published method based on Pacbio sequencing of long bisulfite PCR products (single-molecule real-time bisulfite sequencing, SMRT-BS) and tested the validity of the improved methodology with a smoking phenotype.Herein, we describe the “panhandle” modification of the method, which permits a more robust PCR with multiple targets. We applied this technique to determine smoking by DNA methylation in 71 healthy people and 83 schizophrenia patients (n?=?50 smokers and n?=?104 non-smokers, Russians of the Moscow region). We used five targets known to be influenced by smoking (regions of genes AHRR, ALPPL2, IER3, GNG12, and GFI1). We discovered significant allele-specific methylation effects in the AHRR and IER3 regions and assessed how this information could be exploited to improve the prediction of smoking based on the collected DNA methylation data. We found no significant difference in the methylation profiles of selected targets in relation to schizophrenia suggesting that smoking affects methylation at the studied genomic sites in healthy people and schizophrenia patients in a similar way.We determined that SMRT-BS with “panhandle” modification performs well in the described setting. Additional information regarding methylation and allele-specific effects could improve the predictive accuracy of DNA methylation-based models, which could be valuable for both basic research and clinical applications.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.