Menu
September 22, 2019

Genomics: Next regeneration sequencing for reference genomes.

Various species have remarkable abilities to regenerate body parts or entire organisms after injury, but a comprehensive understanding of the molecular basis of regeneration mech- anisms will require detailed genomic resources. Two new studies report high-quality reference genomes for two classic regeneration model organ- isms with contrasting genome sizes: the axolotl salamander Ambystoma mexicanum and the planarium flatworm Schmidtea mediterranea.


September 22, 2019

Jointly aligning a group of DNA reads improves accuracy of identifying large deletions.

Performing sequence alignment to identify structural variants, such as large deletions, from genome sequencing data is a fundamental task, but current methods are far from perfect. The current practice is to independently align each DNA read to a reference genome. We show that the propensity of genomic rearrangements to accumulate in repeat-rich regions imposes severe ambiguities in these alignments, and consequently on the variant calls-with current read lengths, this affects more than one third of known large deletions in the C. Venter genome. We present a method to jointly align reads to a genome, whereby alignment ambiguity of one read can be disambiguated by other reads. We show this leads to a significant improvement in the accuracy of identifying large deletions (=20 bases), while imposing minimal computational overhead and maintaining an overall running time that is at par with current tools. A software implementation is available as an open-source Python program called JRA at https://bitbucket.org/jointreadalignment/jra-src.


September 22, 2019

A survey of localized sequence rearrangements in human DNA.

Genomes mutate and evolve in ways simple (substitution or deletion of bases) and complex (e.g. chromosome shattering). We do not fully understand what types of complex mutation occur, and we cannot routinely characterize arbitrarily-complex mutations in a high-throughput, genome-wide manner. Long-read DNA sequencing methods (e.g. PacBio, nanopore) are promising for this task, because one read may encompass a whole complex mutation. We describe an analysis pipeline to characterize arbitrarily-complex ‘local’ mutations, i.e. intrachromosomal mutations encompassed by one DNA read. We apply it to nanopore and PacBio reads from one human cell line (NA12878), and survey sequence rearrangements, both real and artifactual. Almost all the real rearrangements belong to recurring patterns or motifs: the most common is tandem multiplication (e.g. heptuplication), but there are also complex patterns such as localized shattering, which resembles DNA damage by radiation. Gene conversions are identified, including one between hemoglobin gamma genes. This study demonstrates a way to find intricate rearrangements with any number of duplications, deletions, and repositionings. It demonstrates a probability-based method to resolve ambiguous rearrangements involving highly similar sequences, as occurs in gene conversion. We present a catalog of local rearrangements in one human cell line, and show which rearrangement patterns occur.


September 22, 2019

Blood CXCR3+CD4 T cells are enriched in inducible replication competent HIV in aviremic antiretroviral therapy-treated individuals.

We recently demonstrated that lymph nodes (LNs) PD-1+/T follicular helper (Tfh) cells from antiretroviral therapy (ART)-treated HIV-infected individuals were enriched in cells containing replication competent virus. However, the distribution of cells containing inducible replication competent virus has been only partially elucidated in blood memory CD4 T-cell populations including the Tfh cell counterpart circulating in blood (cTfh). In this context, we have investigated the distribution of (1) total HIV-infected cells and (2) cells containing replication competent and infectious virus within various blood and LN memory CD4 T-cell populations of conventional antiretroviral therapy (cART)-treated HIV-infected individuals. In the present study, we show that blood CXCR3-expressing memory CD4 T cells are enriched in cells containing inducible replication competent virus and contributed the most to the total pool of cells containing replication competent and infectious virus in blood. Interestingly, subsequent proviral sequence analysis did not indicate virus compartmentalization between blood and LN CD4 T-cell populations, suggesting dynamic interchanges between the two compartments. We then investigated whether the composition of blood HIV reservoir may reflect the polarization of LN CD4 T cells at the time of reservoir seeding and showed that LN PD-1+CD4 T cells of viremic untreated HIV-infected individuals expressed significantly higher levels of CXCR3 as compared to CCR4 and/or CCR6, suggesting that blood CXCR3-expressing CD4 T cells may originate from LN PD-1+CD4 T cells. Taken together, these results indicate that blood CXCR3-expressing CD4 T cells represent the major blood compartment containing inducible replication competent virus in treated aviremic HIV-infected individuals.


September 22, 2019

Conventional and single-molecule targeted sequencing method for specific variant detection in IKBKG while bypassing the IKBKGP1 pseudogene.

In addition to Sanger sequencing, next-generation sequencing of gene panels and exomes has emerged as a standard diagnostic tool in many laboratories. However, these captures can miss regions, have poor efficiency, or capture pseudogenes, which hamper proper diagnoses. One such example is the primary immunodeficiency-associated gene IKBKG. Its pseudogene IKBKGP1 makes traditional capture methods aspecific. We therefore developed a long-range PCR method to efficiently target IKBKG, as well as two associated genes (IRAK4 and MYD88), while bypassing the IKBKGP1 pseudogene. Sequencing accuracy was evaluated using both conventional short-read technology and a newer long-read, single-molecule sequencer. Different mapping and variant calling options were evaluated in their capability to bypass the pseudogene using both sequencing platforms. Based on these evaluations, we determined a robust diagnostic application for unambiguous sequencing and variant calling in IKBKG, IRAK4, and MYD88. This method allows rapid identification of selected primary immunodeficiency diseases in patients suffering from life-threatening invasive pyogenic bacterial infections. Copyright © 2018 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.


September 22, 2019

Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials

Benchmark small variant calls from the Genome in a Bottle Consortium (GIAB) for the CEPH/HapMap genome NA12878 (HG001) have been used extensively for developing, optimizing, and demonstrating performance of sequencing and bioinformatics methods. Here, we develop a reproducible, cloud-based pipeline to integrate multiple sequencing datasets and form benchmark calls, enabling application to arbitrary human genomes. We use these reproducible methods to form high-confidence calls with respect to GRCh37 and GRCh38 for HG001 and 4 additional broadly-consented genomes from the Personal Genome Project that are available as NIST Reference Materials. These new genomes’ broad, open consent with few restrictions on availability of samples and data is enabling a uniquely diverse array of applications. Our new methods produce 17% more high-confidence SNPs, 176% more indels, and 12% larger regions than our previously published calls. To demonstrate that these calls can be used for accurate benchmarking, we compare other high-quality callsets to ours (e.g., Illumina Platinum Genomes), and we demonstrate that the majority of discordant calls are errors in the other callsets, We also highlight challenges in interpreting performance metrics when benchmarking against imperfect high-confidence calls. We show that benchmarking tools from the Global Alliance for Genomics and Health can be used with our calls to stratify performance metrics by variant type and genome context and elucidate strengths and weaknesses of a method.


September 22, 2019

Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy.

Epilepsy is a common neurological disorder, and mutations in genes encoding ion channels or neurotransmitter receptors are frequent causes of monogenic forms of epilepsy. Here we show that abnormal expansions of TTTCA and TTTTA repeats in intron 4 of SAMD12 cause benign adult familial myoclonic epilepsy (BAFME). Single-molecule, real-time sequencing of BAC clones and nanopore sequencing of genomic DNA identified two repeat configurations in SAMD12. Intriguingly, in two families with a clinical diagnosis of BAFME in which no repeat expansions in SAMD12 were observed, we identified similar expansions of TTTCA and TTTTA repeats in introns of TNRC6A and RAPGEF2, indicating that expansions of the same repeat motifs are involved in the pathogenesis of BAFME regardless of the genes in which the expanded repeats are located. This discovery that expansions of noncoding repeats lead to neuronal dysfunction responsible for myoclonic tremor and epilepsy extends the understanding of diseases with such repeat expansion.


September 22, 2019

Autologous cell therapy approach for Duchenne muscular dystrophy using PiggyBac transposons and mesoangioblasts.

Duchenne muscular dystrophy (DMD) is a lethal muscle-wasting disease currently without cure. We investigated the use of the PiggyBac transposon for full-length dystrophin expression in murine mesoangioblast (MABs) progenitor cells. DMD murine MABs were transfected with transposable expression vectors for full-length dystrophin and transplanted intramuscularly or intra-arterially into mdx/SCID mice. Intra-arterial delivery indicated that the MABs could migrate to regenerating muscles to mediate dystrophin expression. Intramuscular transplantation yielded dystrophin expression in 11%-44% of myofibers in murine muscles, which remained stable for the assessed period of 5 months. The satellite cells isolated from transplanted muscles comprised a fraction of MAB-derived cells, indicating that the transfected MABs may colonize the satellite stem cell niche. Transposon integration site mapping by whole-genome sequencing indicated that 70% of the integrations were intergenic, while none was observed in an exon. Muscle resistance assessment by atomic force microscopy indicated that 80% of fibers showed elasticity properties restored to those of wild-type muscles. As measured in vivo, transplanted muscles became more resistant to fatigue. This study thus provides a proof-of-principle that PiggyBac transposon vectors may mediate full-length dystrophin expression as well as functional amelioration of the dystrophic muscles within a potential autologous cell-based therapeutic approach of DMD. Copyright © 2018 The American Society of Gene and Cell Therapy. Published by Elsevier Inc. All rights reserved.


September 22, 2019

Comparison of phasing strategies for whole human genomes.

Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not ‘phase’ the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available ‘Genome-In-A-Bottle’ (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density.


September 22, 2019

Epigenetic landscape influences the liver cancer genome architecture.

The accumulations of different types of genetic alterations such as nucleotide substitutions, structural rearrangements and viral genome integrations and epigenetic alterations contribute to carcinogenesis. Here, we report correlation between the occurrence of epigenetic features and genetic aberrations by whole-genome bisulfite, whole-genome shotgun, long-read, and virus capture sequencing of 373 liver cancers. Somatic substitutions and rearrangement breakpoints are enriched in tumor-specific hypo-methylated regions with inactive chromatin marks and actively transcribed highly methylated regions in the cancer genome. Individual mutation signatures depend on chromatin status, especially, signatures with a higher transcriptional strand bias occur within active chromatic areas. Hepatitis B virus (HBV) integration sites are frequently detected within inactive chromatin regions in cancer cells, as a consequence of negative selection for integrations in active chromatin regions. Ultra-high structural instability and preserved unmethylation of integrated HBV genomes are observed. We conclude that both precancerous and somatic epigenetic features contribute to the cancer genome architecture.


September 22, 2019

IMSindel: An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis.

Insertions and deletions (indels) have been implicated in dozens of human diseases through the radical alteration of gene function by short frameshift indels as well as long indels. However, the accurate detection of these indels from next-generation sequencing data is still challenging. This is particularly true for intermediate-size indels (=50?bp), due to the short DNA sequencing reads. Here, we developed a new method that predicts intermediate-size indels using BWA soft-clipped fragments (unmatched fragments in partially mapped reads) and unmapped reads. We report the performance comparison of our method, GATK, PINDEL and ScanIndel, using whole exome sequencing data from the same samples. False positive and false negative counts were determined through Sanger sequencing of all predicted indels across these four methods. The harmonic mean of the recall and precision, F-measure, was used to measure the performance of each method. Our method achieved the highest F-measure of 0.84 in one sample, compared to 0.56 for GATK, 0.52 for PINDEL and 0.46 for ScanIndel. Similar results were obtained in additional samples, demonstrating that our method was superior to the other methods for detecting intermediate-size indels. We believe that this methodology will contribute to the discovery of intermediate-size indels associated with human disease.


September 22, 2019

Evaluation of WGS based approaches for investigating a food-borne outbreak caused by Salmonella enterica serovar Derby in Germany.

In Germany salmonellosis still represents the 2nd most common bacterial foodborne disease. The majority of infections are caused by Salmonella (S.) Typhimurium and S. Enteritidis followed by a variety of other broad host-range serovars. Salmonella Derby is one of the five top-ranked serovars isolated from humans and it represents one of the most prevalent serovars in pigs, thus bearing the potential risk for transmission to humans upon consumption of pig meat and products thereof. From November 2013 to January 2014 S. Derby caused a large outbreak that affected 145 primarily elderly people. Epidemiological investigations identified raw pork sausage as the probable source of infection, which was confirmed by microbiological evidence. During the outbreak isolates from patients, food specimen and asymptomatic carriers were investigated by conventional typing methods. However, the quantity and quality of available microbiological and epidemiological data made this outbreak highly suitable for retrospective investigation by Whole Genome Sequencing (WGS) and subsequent evaluation of different bioinformatics approaches for cluster definition. Overall the WGS-based methods confirmed the results of the conventional typing but were of significant higher discriminatory power. That was particularly beneficial for strains with incomplete epidemiological data. For our data set both, single nucleotide polymorphism (SNP)- and core genome multilocus sequence typing (cgMLST)-based methods proved to be appropriate tools for cluster definition. Copyright © 2017 Elsevier Ltd. All rights reserved.


September 22, 2019

NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data.

Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers.In this study, we developed NextSV, a meta-caller to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purposes. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 score) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5 to 94.1% for deletions and 87.9 to 93.2% for insertions, indicating that ~10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset.Our results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.