Menu
July 7, 2019

Whole genome sequence and comparative genomics of the novel Lyme borreliosis causing pathogen, Borrelia mayonii.

Borrelia mayonii, a Borrelia burgdorferi sensu lato (Bbsl) genospecies, was recently identified as a cause of Lyme borreliosis (LB) among patients from the upper midwestern United States. By microscopy and PCR, spirochete/genome loads in infected patients were estimated at 105 to 106 per milliliter of blood. Here, we present the full chromosome and plasmid sequences of two B. mayonii isolates, MN14-1420 and MN14-1539, cultured from blood of two of these patients. Whole genome sequencing and assembly was conducted using PacBio long read sequencing (Pacific Biosciences RSII instrument) followed by hierarchical genome-assembly process (HGAP). The B. mayonii genome is ~1.31 Mbp in size (26.9% average GC content) and is comprised of a linear chromosome, 8 linear and 7 circular plasmids. Consistent with its taxonomic designation as a new Bbsl genospecies, the B. mayonii linear chromosome shares only 93.83% average nucleotide identity with other genospecies. Both B. mayonii genomes contain plasmids similar to B. burgdorferi sensu stricto lp54, lp36, lp28-3, lp28-4, lp25, lp17, lp5, 5 cp32s, cp26, and cp9. The vls locus present on lp28-10 of B. mayonii MN14-1420 is remarkably long, being comprised of 24 silent vls cassettes. Genetic differences between the two B. mayonii genomes are limited and include 15 single nucleotide variations as well as 7 fewer silent vls cassettes and a lack of the lp5 plasmid in MN14-1539. Notably, 68 homologs to proteins present in B. burgdorferi sensu stricto appear to be lacking from the B. mayonii genomes. These include the complement inhibitor, CspZ (BB_H06), the fibronectin binding protein, BB_K32, as well as multiple lipoproteins and proteins of unknown function. This study shows the utility of long read sequencing for full genome assembly of Bbsl genomes, identifies putative genome regions of B. mayonii that may be linked to clinical manifestation or tissue tropism, and provides a valuable resource for pathogenicity, diagnostic and vaccine studies.


July 7, 2019

Colib’read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads

With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools.


July 7, 2019

Improve homology search sensitivity of PacBio data by correcting frameshifts.

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data.In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing.The source code is freely available at https://sourceforge.net/projects/frame-pro/yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

SRinversion: a tool for detecting short inversions by splitting and re-aligning poorly mapped and unmapped sequencing reads.

Rapid development in sequencing technologies has dramatically improved our ability to detect genetic variants in human genome. However, current methods have variable sensitivities in detecting different types of genetic variants. One type of such genetic variants that is especially hard to detect is inversions. Analysis of public databases showed that few short inversions have been reported so far. Unlike reads that contain small insertions or deletions, which will be considered through gap alignment, reads carrying short inversions often have poor mapping quality or are unmapped, thus are often not further considered. As a result, the majority of short inversions might have been overlooked and require special algorithms for their detection.Here, we introduce SRinversion, a framework to analyze poorly mapped or unmapped reads by splitting and re-aligning them for the purpose of inversion detection. SRinversion is very sensitive to small inversions and can detect those less than 10?bp in size. We applied SRinversion to both simulated data and high-coverage sequencing data from the 1000 Genomes Project and compared the results with those from Pindel, BreakDancer, DELLY, Gustaf and MID. A better performance of SRinversion was achieved for both datasets for the detection of small inversions.SRinversion is implemented in Perl and is publicly available at http://paed.hku.hk/genome/software/SRinversion/index.html CONTACT: yangwl@hku.hkSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Genomic sequencing-based mutational enrichment analysis identifies motility genes in a genetically intractable gut microbe.

A major roadblock to understanding how microbes in the gastrointestinal tract colonize and influence the physiology of their hosts is our inability to genetically manipulate new bacterial species and experimentally assess the function of their genes. We describe the application of population-based genomic sequencing after chemical mutagenesis to map bacterial genes responsible for motility in Exiguobacterium acetylicum, a representative intestinal Firmicutes bacterium that is intractable to molecular genetic manipulation. We derived strong associations between mutations in 57 E. acetylicum genes and impaired motility. Surprisingly, less than half of these genes were annotated as motility-related based on sequence homologies. We confirmed the genetic link between individual mutations and loss of motility for several of these genes by performing a large-scale analysis of spontaneous suppressor mutations. In the process, we reannotated genes belonging to a broad family of diguanylate cyclases and phosphodiesterases to highlight their specific role in motility and assigned functions to uncharacterized genes. Furthermore, we generated isogenic strains that allowed us to establish that Exiguobacterium motility is important for the colonization of its vertebrate host. These results indicate that genetic dissection of a complex trait, functional annotation of new genes, and the generation of mutant strains to define the role of genes in complex environments can be accomplished in bacteria without the development of species-specific molecular genetic tools.


July 7, 2019

Complete genome sequence and transcriptome regulation of the pentose utilizing yeast Sugiyamaella lignohabitans.

Efficient conversion of hexoses and pentoses into value-added chemicals represents one core step for establishing economically feasible biorefineries from lignocellulosic material. While extensive research efforts have recently provided advances in the overall process performance, the quest for new microbial cell factories and novel enzymes sources is still open. As demonstrated recently the yeast Sugiyamaella lignohabitans (formerly Candida lignohabitans) represents a promising microbial cell factory for the production of organic acids from lignocellulosic hydrolysates. We report here the de novo genome assembly of S. lignohabitans using the Single Molecule Real-Time platform, with gene prediction refined by using RNA-seq. The sequencing revealed a 15.98 Mb genome, subdivided into four chromosomes. By phylogenetic analysis, Blastobotrys (Arxula) adeninivorans and Yarrowia lipolytica were found to be close relatives of S. lignohabitans Differential gene expression was evaluated in typical growth conditions on glucose and xylose and allowed a first insight into the transcriptional response of S. lignohabitans to different carbon sources and different oxygenation conditions. Novel sequences for enzymes and transporters involved in the central carbon metabolism, and therefore of potential biotechnological interest, were identified. These data open the way for a better understanding of the metabolism of S. lignohabitans and provide resources for further metabolic engineering.© FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Spontaneous chloroplast mutants mostly occur by replication slippage and show a biased pattern in the plastome of Oenothera.

Spontaneous plastome mutants have been used as a research tool since the beginning of genetics. However, technical restrictions have severely limited their contributions to research in physiology and molecular biology. Here, we used full plastome sequencing to systematically characterize a collection of 51 spontaneous chloroplast mutants in Oenothera (evening primrose). Most mutants carry only a single mutation. Unexpectedly, the vast majority of mutations do not represent single nucleotide polymorphisms but are insertions/deletions originating from DNA replication slippage events. Only very few mutations appear to be caused by imprecise double-strand break repair, nucleotide misincorporation during replication, or incorrect nucleotide excision repair following oxidative damage. U-turn inversions were not detected. Replication slippage is induced at repetitive sequences that can be very small and tend to have high A/T content. Interestingly, the mutations are not distributed randomly in the genome. The underrepresentation of mutations caused by faulty double-strand break repair might explain the high structural conservation of seed plant plastomes throughout evolution. In addition to providing a fully characterized mutant collection for future research on plastid genetics, gene expression, and photosynthesis, our work identified the spectrum of spontaneous mutations in plastids and reveals that this spectrum is very different from that in the nucleus.© 2016 American Society of Plant Biologists. All rights reserved.


July 7, 2019

Origins of the current seventh cholera pandemic.

Vibrio cholerae has caused seven cholera pandemics since 1817, imposing terror on much of the world, but bacterial strains are currently only available for the sixth and seventh pandemics. The El Tor biotype seventh pandemic began in 1961 in Indonesia, but did not originate directly from the classical biotype sixth-pandemic strain. Previous studies focused mainly on the spread of the seventh pandemic after 1970. Here, we analyze in unprecedented detail the origin, evolution, and transition to pandemicity of the seventh-pandemic strain. We used high-resolution comparative genomic analysis of strains collected from 1930 to 1964, covering the evolution from the first available El Tor biotype strain to the start of the seventh pandemic. We define six stages leading to the pandemic strain and reveal all key events. The seventh pandemic originated from a nonpathogenic strain in the Middle East, first observed in 1897. It subsequently underwent explosive diversification, including the spawning of the pandemic lineage. This rapid diversification suggests that, when first observed, the strain had only recently arrived in the Middle East, possibly from the Asian homeland of cholera. The lineage migrated to Makassar, Indonesia, where it gained the important virulence-associated elements Vibrio seventh pandemic island I (VSP-I), VSP-II, and El Tor type cholera toxin prophage by 1954, and it then became pandemic in 1961 after only 12 additional mutations. Our data indicate that specific niches in the Middle East and Makassar were important in generating the pandemic strain by providing gene sources and the driving forces for genetic events.


July 7, 2019

svclassify: a method to establish benchmark structural variant calls.

The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives.We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz.We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.


July 7, 2019

Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes.

Buckwheat (Fagopyrum esculentum Moench; 2n = 2x = 16) is a nutritionally dense annual crop widely grown in temperate zones. To accelerate molecular breeding programmes of this important crop, we generated a draft assembly of the buckwheat genome using short reads obtained by next-generation sequencing (NGS), and constructed the Buckwheat Genome DataBase. After assembling short reads, we determined 387,594 scaffolds as the draft genome sequence (FES_r1.0). The total length of FES_r1.0 was 1,177,687,305 bp, and the N50 of the scaffolds was 25,109 bp. Gene prediction analysis revealed 286,768 coding sequences (CDSs; FES_r1.0_cds) including those related to transposable elements. The total length of FES_r1.0_cds was 212,917,911 bp, and the N50 was 1,101 bp. Of these, the functions of 35,816 CDSs excluding those for transposable elements were annotated by BLAST analysis. To demonstrate the utility of the database, we conducted several test analyses using BLAST and keyword searches. Furthermore, we used the draft genome as a reference sequence for NGS-based markers, and successfully identified novel candidate genes controlling heteromorphic self-incompatibility of buckwheat. The database and draft genome sequence provide a valuable resource that can be used in efforts to develop buckwheat cultivars with superior agronomic traits.© The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


July 7, 2019

Exploiting next-generation sequencing to solve the haplotyping puzzle in polyploids: a simulation study.

Haplotypes are the units of inheritance in an organism, and many genetic analyses depend on their precise determination. Methods for haplotyping single individuals use the phasing information available in next-generation sequencing reads, by matching overlapping single-nucleotide polymorphisms while penalizing post hoc nucleotide corrections made. Haplotyping diploids is relatively easy, but the complexity of the problem increases drastically for polyploid genomes, which are found in both model organisms and in economically relevant plant and animal species. Although a number of tools are available for haplotyping polyploids, the effects of the genomic makeup and the sequencing strategy followed on the accuracy of these methods have hitherto not been thoroughly evaluated.We developed the simulation pipeline haplosim to evaluate the performance of three haplotype estimation algorithms for polyploids: HapCompass, HapTree and SDhaP, in settings varying in sequencing approach, ploidy levels and genomic diversity, using tetraploid potato as the model. Our results show that sequencing depth is the major determinant of haplotype estimation quality, that 1?kb PacBio circular consensus sequencing reads and Illumina reads with large insert-sizes are competitive and that all methods fail to produce good haplotypes when ploidy levels increase. Comparing the three methods, HapTree produces the most accurate estimates, but also consumes the most resources. There is clearly room for improvement in polyploid haplotyping algorithms.


July 7, 2019

Collection and storage of HLA NGS genotyping data for the 17th International HLA and Immunogenetics Workshop.

For over 50?years, the International HLA and Immunogenetics Workshops (IHIW) have advanced the fields of histocompatibility and immunogenetics (H&I) via community sharing of technology, experience and reagents, and the establishment of ongoing collaborative projects. Held in the fall of 2017, the 17th IHIW focused on the application of next generation sequencing (NGS) technologies for clinical and research goals in the H&I fields. NGS technologies have the potential to allow dramatic insights and advances in these fields, but the scope and sheer quantity of data associated with NGS raise challenges for their analysis, collection, exchange and storage. The 17th IHIW adopted a centralized approach to these issues, and we developed the tools, services and systems to create an effective system for capturing and managing these NGS data. We worked with NGS platform and software developers to define a set of distinct but equivalent NGS typing reports that record NGS data in a uniform fashion. The 17th IHIW database applied our standards, tools and services to collect, validate and store those structured, multi-platform data in an automated fashion. We have created community resources to enable exploration of the vast store of curated sequence and allele-name data in the IPD-IMGT/HLA Database, with the goal of creating a long-term community resource that integrates these curated data with new NGS sequence and polymorphism data, for advanced analyses and applications. Copyright © 2017 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.


July 7, 2019

Microbial sequence typing in the genomic era.

Next-generation sequencing (NGS), also known as high-throughput sequencing, is changing the field of microbial genomics research. NGS allows for a more comprehensive analysis of the diversity, structure and composition of microbial genes and genomes compared to the traditional automated Sanger capillary sequencing at a lower cost. NGS strategies have expanded the versatility of standard and widely used typing approaches based on nucleotide variation in several hundred DNA sequences and a few gene fragments (MLST, MLVA, rMLST and cgMLST). NGS can now accommodate variation in thousands or millions of sequences from selected amplicons to full genomes (WGS, NGMLST and HiMLST). To extract signals from high-dimensional NGS data and make valid statistical inferences, novel analytic and statistical techniques are needed. In this review, we describe standard and new approaches for microbial sequence typing at gene and genome levels and guidelines for subsequent analysis, including methods and computational frameworks. We also present several applications of these approaches to some disciplines, namely genotyping, phylogenetics and molecular epidemiology. Copyright © 2017 Elsevier B.V. All rights reserved.


July 7, 2019

FDA-CDC antimicrobial resistance isolate bank: A publicly-available resource to support research, development and regulatory requirements.

The FDA-CDC Antimicrobial Resistance Isolate Bank was created in July 2015 as a publicly available resource to combat antimicrobial resistance. It is a curated repository of bacterial isolates with an assortment of clinically-important resistance mechanisms that have been phenotypically and genotypically characterized. In the first two years of operation, the Bank offered 14 panels comprising 496 unique isolates and had filled 486 orders from 394 institutions throughout the United States. New panels are being added. Copyright © 2017 American Society for Microbiology.


July 7, 2019

De novo mutations resolve disease transmission pathways in clonal malaria

Detecting de novo mutations in viral and bacterial pathogens enables researchers to reconstruct detailed networks of disease transmission and is a key technique in genomic epidemiology. However, these techniques have not yet been applied to the malaria parasite, Plasmodium falciparum, in which a larger genome, slower generation times, and a complex life cycle make them difficult to implement. Here, we demonstrate the viability of de novo mutation studies in P. falciparum for the first time. Using a combination of sequencing, library preparation, and genotyping methods that have been optimized for accuracy in low-complexity genomic regions, we have detected de novo mutations that distinguish nominally identical parasites from clonal lineages. Despite its slower evolutionary rate compared with bacterial or viral species, de novo mutation can be detected in P. falciparum across timescales of just 1-2?years and evolutionary rates in low-complexity regions of the genome can be up to twice that detected in the rest of the genome. The increased mutation rate allows the identification of separate clade expansions that cannot be found using previous genomic epidemiology approaches and could be a crucial tool for mapping residual transmission patterns in disease elimination campaigns and reintroduction scenarios.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.