Menu
July 7, 2019  |  

FusorSV: an algorithm for optimally combining data from multiple structural variation detection methods.

Comprehensive and accurate identification of structural variations (SVs) from next generation sequencing data remains a major challenge. We develop FusorSV, which uses a data mining approach to assess performance and merge callsets from an ensemble of SV-calling algorithms. It includes a fusion model built using analysis of 27 deep-coverage human genomes from the 1000 Genomes Project. We identify 843 novel SV calls that were not reported by the 1000 Genomes Project for these 27 samples. Experimental validation of a subset of these calls yields a validation rate of 86.7%. FusorSV is available at https://github.com/TheJacksonLaboratory/SVE .


July 7, 2019  |  

RIFRAF: a frame-resolving consensus algorithm.

Protein coding genes can be studied using long-read next generation sequencing. However, high rates of indel sequencing errors are problematic, corrupting the reading frame. Even the consensus of multiple independent sequence reads retains indel errors. To solve this problem, we introduce Reference-Informed Frame-Resolving multiple-Alignment Free template inference algorithm (RIFRAF), a sequence consensus algorithm that takes a set of error-prone reads and a reference sequence and infers an accurate in-frame consensus. RIFRAF uses a novel structure, analogous to a two-layer hidden Markov model: the consensus is optimized to maximize alignment scores with both the set of noisy reads and with a reference. The template-to-reads component of the model encodes the preponderance of indels, and is sensitive to the per-base quality scores, giving greater weight to more accurate bases. The reference-to-template component of the model penalizes frame-destroying indels. A local search algorithm proceeds in stages to find the best consensus sequence for both objectives.Using Pacific Biosciences SMRT sequences from an HIV-1 env clone, NL4-3, we compare our approach to other consensus and frame correction methods. RIFRAF consistently finds a consensus sequence that is more accurate and in-frame, especially with small numbers of reads. It was able to perfectly reconstruct over 80% of consensus sequences from as few as three reads, whereas the best alternative required twice as many. RIFRAF is able to achieve these results and keep the consensus in-frame even with a distantly related reference sequence. Moreover, unlike other frame correction methods, RIFRAF can detect and keep true indels while removing erroneous ones.RIFRAF is implemented in Julia, and source code is publicly available at https://github.com/MurrellGroup/Rifraf.jl.Supplementary data are available at Bioinformatics online.


July 7, 2019  |  

GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing.

Tandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations. We report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely – GtTR – which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation.We used a PacBio long-read sequenced sample to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68 and 83% for capture sequence data and 200X WGS data respectively, improving to 87 and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25, 14, 12 and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results.The novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.


July 7, 2019  |  

Modular traits of the Rhizobiales root microbiota and their evolutionary relationship with symbiotic Rhizobia.

Rhizobia are a paraphyletic group of soil-borne bacteria that induce nodule organogenesis in legume roots and fix atmospheric nitrogen for plant growth. In non-leguminous plants, species from the Rhizobiales order define a core lineage of the plant microbiota, suggesting additional functional interactions with plant hosts. In this work, genome analyses of 1,314 Rhizobiales isolates along with amplicon studies of the root microbiota reveal the evolutionary history of nitrogen-fixing symbiosis in this bacterial order. Key symbiosis genes were acquired multiple times, and the most recent common ancestor could colonize roots of a broad host range. In addition, root growth promotion is a characteristic trait of Rhizobiales in Arabidopsis thaliana, whereas interference with plant immunity constitutes a separate, strain-specific phenotype of root commensal Alphaproteobacteria. Additional studies with a tripartite gnotobiotic plant system reveal that these traits operate in a modular fashion and thus might be relevant to microbial homeostasis in healthy roots. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.


July 7, 2019  |  

New reference genome sequences for 17 bacterial strains of the honey bee gut microbiota.

We sequenced the genomes of 17 strains isolated from the gut of honey bees, including strains representing the genera Lactobacillus, Bifidobacterium, Gilliamella, Snodgrassella, Frischella, and Commensalibacter. These genome sequences represent an important step forward in the development of a comprehensive reference database to aid future analysis of this emerging gut microbiota model.


July 7, 2019  |  

Low-level antimicrobials in the medicinal leech select for resistant pathogens that spread to patients.

Fluoroquinolones (FQs) and ciprofloxacin (Cp) are important antimicrobials that pollute the environment in trace amounts. Although Cp has been recommended as prophylaxis for patients undergoing leech therapy to prevent infections by the leech gut symbiont Aeromonas, a puzzling rise in Cp-resistant (Cpr) Aeromonas infections has been reported. We report on the effects of subtherapeutic FQ concentrations on bacteria in an environmental reservoir, the medicinal leech, and describe the presence of multiple antibiotic resistance mutations and a gain-of-function resistance gene. We link the rise of CprAeromonas isolates to exposure of the leech microbiota to very low levels of Cp (0.01 to 0.04 µg/ml), <1/100 of the clinical resistance breakpoint for Aeromonas Using competition experiments and comparative genomics of 37 strains, we determined the mechanisms of resistance in clinical and leech-derived Aeromonas isolates, traced their origin, and determined that the presence of merely 0.01 µg/ml Cp provides a strong competitive advantage for Cpr strains. Deep-sequencing the Cpr-conferring region of gyrA enabled tracing of the mutation-harboring Aeromonas population in archived gut samples, and an increase in the frequency of the Cpr-conferring mutation in 2011 coincides with the initial reports of CprAeromonas infections in patients receiving leech therapy.IMPORTANCE The role of subtherapeutic antimicrobial contamination in selecting for resistant strains has received increasing attention and is an important clinical matter. This study describes the relationship of resistant bacteria from the medicinal leech, Hirudo verbana, with patient infections following leech therapy. While our results highlight the need for alternative antibiotic therapies, the rise of Cpr bacteria demonstrates the importance of restricting the exposure of animals to antibiotics approved for veterinary use. The shift to a more resistant community and the dispersion of Cpr-conferring mechanisms via mobile elements occurred in a natural setting due to the presence of very low levels of fluoroquinolones, revealing the challenges of controlling the spread of antibiotic-resistant bacteria and highlighting the importance of a holistic approach in the management of antibiotic use. Copyright © 2018 Beka et al.


July 7, 2019  |  

Immunoglobulin gene analysis as a tool for investigating human immune responses.

The human immunoglobulin repertoire is a hugely diverse set of sequences that are formed by processes of gene rearrangement, heavy and light chain gene assortment, class switching and somatic hypermutation. Early B cell development produces diverse IgM and IgD B cell receptors on the B cell surface, resulting in a repertoire that can bind many foreign antigens but which has had self-reactive B cells removed. Later antigen-dependent development processes adjust the antigen affinity of the receptor by somatic hypermutation. The effector mechanism of the antibody is also adjusted, by switching the class of the antibody from IgM to one of seven other classes depending on the required function. There are many instances in human biology where positive and negative selection forces can act to shape the immunoglobulin repertoire and therefore repertoire analysis can provide useful information on infection control, vaccination efficacy, autoimmune diseases, and cancer. It can also be used to identify antigen-specific sequences that may be of use in therapeutics. The juxtaposition of lymphocyte development and numerical evaluation of immune repertoires has resulted in the growth of a new sub-speciality in immunology where immunologists and computer scientists/physicists collaborate to assess immune repertoires and develop models of immune action.© 2018 The Authors. Immunological Reviews Published by John Wiley & Sons Ltd.


July 7, 2019  |  

Clustering of circular consensus sequences: accurate error correction and assembly of single molecule real-time reads from multiplexed amplicon libraries.

Targeted resequencing with high-throughput sequencing (HTS) platforms can be used to efficiently interrogate the genomes of large numbers of individuals. A critical issue for research and applications using HTS data, especially from long-read platforms, is error in base calling arising from technological limits and bioinformatic algorithms. We found that the community standard long amplicon analysis (LAA) module from Pacific Biosciences is prone to substantial bioinformatic errors that raise concerns about findings based on this pipeline, prompting the need for a new method.A single molecule real-time (SMRT) sequencing-error correction and assembly pipeline, C3S-LAA, was developed for libraries of pooled amplicons. By uniquely leveraging the structure of SMRT sequence data (comprised of multiple low quality subreads from which higher quality circular consensus sequences are formed) to cluster raw reads, C3S-LAA produced accurate consensus sequences and assemblies of overlapping amplicons from single sample and multiplexed libraries. In contrast, despite read depths in excess of 100X per amplicon, the standard long amplicon analysis module from Pacific Biosciences generated unexpected numbers of amplicon sequences with substantial inaccuracies in the consensus sequences. A bootstrap analysis showed that the C3S-LAA pipeline per se was effective at removing bioinformatic sources of error, but in rare cases a read depth of nearly 400X was not sufficient to overcome minor but systematic errors inherent to amplification or sequencing.C3S-LAA uses a divide and conquer processing algorithm for SMRT amplicon-sequence data that generates accurate consensus sequences and local sequence assemblies. Solving the confounding bioinformatic source of error in LAA allowed for the identification of limited instances of errors due to DNA amplification or sequencing of homopolymeric nucleotide tracts. For research and development in genomics, C3S-LAA allows meaningful conclusions and biological inferences to be made from accurately polished sequence output.


July 7, 2019  |  

Implementation of pharmacogenomics in everyday clinical settings.

Currently, germline pharmacogenomics (PGx) is successfully implemented within certain specialties in clinical care. With the integration of PGx in pharmacotherapy multiple stakeholders are involved, which are identified in this chapter. Clinically relevant pharmacogenes with their related PGx test are discussed, along with diagnostic test criteria to guide clinicians and policy makers in PGx test selection. The chapter further reviews the similarities and the differences between the guidelines of the Dutch Pharmacogenetics Working Group and the Clinical Pharmacogenetics Implementation Consortium which both support healthcare professionals in understanding PGx test results and help guiding pharmacotherapy by providing evidence-based dosing recommendations. Finally, clinical studies which provide scientific evidence and information on cost-effectiveness supporting clinical implementation of PGx in clinical care are discussed along with the remaining barriers for adoption of PGx testing by healthcare professionals.© 2018 Elsevier Inc. All rights reserved.


July 7, 2019  |  

Recent advances on detection and characterization of fruit tree viruses using high-throughput sequencing technologies.

Perennial crops, such as fruit trees, are infected by many viruses, which are transmitted through vegetative propagation and grafting of infected plant material. Some of these pathogens cause severe crop losses and often reduce the productive life of the orchards. Detection and characterization of these agents in fruit trees is challenging, however, during the last years, the wide application of high-throughput sequencing (HTS) technologies has significantly facilitated this task. In this review, we present recent advances in the discovery, detection, and characterization of fruit tree viruses and virus-like agents accomplished by HTS approaches. A high number of new viruses have been described in the last 5 years, some of them exhibiting novel genomic features that have led to the proposal of the creation of new genera, and the revision of the current virus taxonomy status. Interestingly, several of the newly identified viruses belong to virus genera previously unknown to infect fruit tree species (e.g., Fabavirus, Luteovirus) a fact that challenges our perspective of plant viruses in general. Finally, applied methodologies, including the use of different molecules as templates, as well as advantages and disadvantages and future directions of HTS in fruit tree virology are discussed.


July 7, 2019  |  

Genome-wide analysis of the invertase gene family from maize.

The recent release of the maize genome (AGPv4) contains annotation errors of invertase genes and therefore the enzymes are bestly curated manually at the protein level in a comprehensible fashion The synthesis, transport and degradation of sucrose are determining factors for biomass allocation and yield of crop plants. Invertase (INV) is a key enzyme of carbon metabolism in both source and sink tissues. Current releases of the maize genome correctly annotates only two vacuolar invertases (ivr1 and ivr2) and four cell wall invertases (incw1, incw2 (mn1), incw3, and incw4). Our comprehensive survey identified 21 INV isogenes for which we propose a standard nomenclature grouped phylogenetically by amino acid similarity: three vacuolar (INVVR), eight cell wall (INVCW), and ten alkaline/neutral (INVAN) isogenes which form separate dendogram branches due to distinct molecular features. The acidic enzymes were curated for the presence of the DPN tripeptide which is coded by one of the smallest exons reported in plants. Particular attention was placed on the molecular role of INV in vascular tissues such as the nodes, internodes, leaf sheath, husk leaves and roots. We report the expression profile of most members of the maize INV family in nine tissues in two developmental stages, R1 and R3. INVCW7, INVVR2, INVAN8, INVAN9, INVAN10, and INVAN3 displayed the highest absolute expressions in most tissues. INVVR3, INVCW5, INVCW8, and INVAN1 showed low mRNA levels. Expressions of most INVs were repressed from stage R1 to R3, except for INVCW7 which increased significantly in all tissues after flowering. The mRNA levels of INVCW7 in the vegetative stem correlated with a higher transport rate of assimilates from leaves to the cob which led to starch accumulation and growth of the female reproductive organs.


July 7, 2019  |  

Chromosomal Sil system contributes to silver resistance in E. coli ATCC 8739.

The rise of antibiotic resistance in pathogenic bacteria is endangering the efficacy of antibiotics, which consequently results in greater use of silver as a biocide. Chromosomal mapping of the Cus system or plasmid encoded Sil system and their relationship with silver resistance was studied for several gram-negative bacteria. However, only few reports investigated silver detoxification mediated by the Sil system integrated in Escherichia coli chromosome. Accordingly, this work aimed to study the Sil system in E. coli ATCC 8739 and to produce evidence for its role in silver resistance development. Silver resistance was induced in E. coli ATCC 8739 by stepwise passage in culture media containing increasing concentrations of AgNO3. The published genome of E. coli ATCC 8739 contains a region showing strong homology to the Sil system genes. The role of this region in E. coli ATCC 8739 was assessed by monitoring the expression of silC upon silver stress, which resulted in a 350-fold increased expression. De novo sequencing of the whole genome of a silver resistant strain derived from E. coli ATCC 8739 revealed mutations in ORFs putative for SilR and CusR. The silver resistant strain (E. coli AgNO3R) showed constitutive expression of silC which posed a cost of fitness resulting in retarded growth. Furthermore, E. coli AgNO3R exhibited cross-resistance to ciprofloxacin and a slightly increased tolerance to ampicillin. This study demonstrates that E. coli is able to develop resistance to silver, which may pose a threat towards an effective use of silver compounds as antiseptics.


July 7, 2019  |  

Picky comprehensively detects high-resolution structural variants in nanopore long reads.

Acquired genomic structural variants (SVs) are major hallmarks of cancer genomes, but they are challenging to reconstruct from short-read sequencing data. Here we exploited the long reads of the nanopore platform using our customized pipeline, Picky ( https://github.com/TheJacksonLaboratory/Picky ), to reveal SVs of diverse architecture in a breast cancer model. We identified the full spectrum of SVs with superior specificity and sensitivity relative to short-read analyses, and uncovered repetitive DNA as the major source of variation. Examination of genome-wide breakpoints at nucleotide resolution uncovered micro-insertions as the common structural features associated with SVs. Breakpoint density across the genome is associated with the propensity for interchromosomal connectivity and was found to be enriched in promoters and transcribed regions of the genome. Furthermore, we observed an over-representation of reciprocal translocations from chromosomal double-crossovers through phased SVs. We demonstrate that Picky analysis is an effective tool for comprehensive detection of SVs in cancer genomes from long-read data.


July 7, 2019  |  

Allele-level KIR genotyping of more than a million samples: Workflow, algorithm, and observations.

The killer-cell immunoglobulin-like receptor (KIR) genes regulate natural killer cell activity, influencing predisposition to immune mediated disease, and affecting hematopoietic stem cell transplantation (HSCT) outcome. Owing to the complexity of the KIR locus, with extensive gene copy number variation (CNV) and allelic diversity, high-resolution characterization of KIR has so far been applied only to relatively small cohorts. Here, we present a comprehensive high-throughput KIR genotyping approach based on next generation sequencing. Through PCR amplification of specific exons, our approach delivers both copy numbers of the individual genes and allelic information for every KIR gene. Ten-fold replicate analysis of a set of 190 samples revealed a precision of 99.9%. Genotyping of an independent set of 360 samples resulted in an accuracy of more than 99% taking into account consistent copy number prediction. We applied the workflow to genotype 1.8 million stem cell donor registry samples. We report on the observed KIR allele diversity and relative abundance of alleles based on a subset of more than 300,000 samples. Furthermore, we identified more than 2,000 previously unreported KIR variants repeatedly in independent samples, underscoring the large diversity of the KIR region that awaits discovery. This cost-efficient high-resolution KIR genotyping approach is now applied to samples of volunteers registering as potential donors for HSCT. This will facilitate the utilization of KIR as additional selection criterion to improve unrelated donor stem cell transplantation outcome. In addition, the approach may serve studies requiring high-resolution KIR genotyping, like population genetics and disease association studies.


July 7, 2019  |  

Alignment-free genome comparison enables accurate geographic sourcing of white oak DNA.

The application of genomic data and bioinformatics for the identification of restricted or illegally-sourced natural products is urgently needed. The taxonomic identity and geographic provenance of raw and processed materials have implications in sustainable-use commercial practices, and relevance to the enforcement of laws that regulate or restrict illegally harvested materials, such as timber. Improvements in genomics make it possible to capture and sequence partial-to-complete genomes from challenging tissues, such as wood and wood products.In this paper, we report the success of an alignment-free genome comparison method, [Formula: see text] that differentiates different geographic sources of white oak (Quercus) species with a high level of accuracy with very small amount of genomic data. The method is robust to sequencing errors, different sequencing laboratories and sequencing platforms.This method offers an approach based on genome-scale data, rather than panels of pre-selected markers for specific taxa. The method provides a generalizable platform for the identification and sourcing of materials using a unified next generation sequencing and analysis framework.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.