Menu
July 7, 2019  |  

Complete genome sequence of Pseudomonas aeruginosa K34-7, a carbapenem-resistant isolate of the high-risk sequence type 233.

Carbapenem-resistant Pseudomonas aeruginosa is defined as a textquotedblleftcriticaltextquotedblright priority pathogen for the development of new antibiotics. Here we report the complete genome sequence of an extensively drug-resistant, Verona integron-encoded metallo-ß-lactamase-expressing isolate belonging to the high-risk sequence type 233.


July 7, 2019  |  

Complete genome sequence of Aeromonas rivipollensis KN-Mc-11N1, isolated from a wild nutria (Myocastor coypus) in South Korea.

We report here the complete genome sequence of Aeromonas rivipollensis KN-Mc-11N1, which was isolated from a wild nutria (Myocastor coypus) in South Korea. Genomic analysis indicated that A. rivipollensis may have zoonotic potential similar to that of other aeromonads, and nutria could be one of the sources of transmission of zoonotic pathogens to humans.


July 7, 2019  |  

Complete genome sequence of a Staphylococcus aureus sequence type 612 isolate from an Australian horse.

Staphylococcus aureus is a serious pathogen of humans and animals. Multilocus sequence type 612 is dominant and highly virulent in South African hospitals but relatively uncommon elsewhere. We present the complete genome sequence of methicillin-resistant Staphylococcus aureus strain SVH7513, isolated from a horse at a veterinary clinic in New South Wales, Australia.


July 7, 2019  |  

An improved approach for reconstructing consensus repeats from short sequence reads

Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genomes which often do not have good reference genomes or annotated repeat libraries. Recently we developed a computational method called REPdenovo that constructs consensus repeat sequences directly from short sequence reads, which outperforms an existing tool called RepARK. One major issue with REPdenovo is that it doesn’t perform well for repeats with relatively high divergence rates or low copy numbers. In this paper, we present an improved approach for constructing consensus repeats directly from short reads. Comparing with the original REPdenovo, the improved approach uses more repeat-related k-mers and improves repeat assembly quality using a consensus-based k-mer processing method.


July 7, 2019  |  

TriPoly: haplotype estimation for polyploids using sequencing data of related individuals.

Knowledge of haplotypes, i.e. phased and ordered marker alleles on a chromosome, is essential to answer many questions in genetics and genomics. By generating short pieces of DNA sequence, high-throughput modern sequencing technologies make estimation of haplotypes possible for single individuals. In polyploids, however, haplotype estimation methods usually require deep coverage to achieve sufficient accuracy. This often renders sequencing-based approaches too costly to be applied to large populations needed in studies of Quantitative Trait Loci.We propose a novel haplotype estimation method for polyploids, TriPoly, that combines sequencing data with Mendelian inheritance rules to infer haplotypes in parent-offspring trios. Using realistic simulations of both short and long-read sequencing data for banana (Musa acuminata) and potato (Solanum tuberosum) trios, we show that TriPoly yields more accurate progeny haplotypes at low coverages compared to existing methods that work on single individuals. We also apply TriPoly to phase Single Nucleotide Polymorphisms on chromosome 5 for a family of tetraploid potato with 2 parents and 37 offspring sequenced with an RNA capture approach. We show that TriPoly haplotype estimates differ from those of the other methods mainly in regions with imperfect sequencing or mapping difficulties, as it does not rely solely on sequence reads and aims to avoid phasings that are not likely to have been passed from the parents to the offspring.TriPoly has been implemented in Python 3.5.2 (also compatible with Python 2.7.3 and higher) and can be freely downloaded at https://github.com/EhsanMotazedi/TriPoly.Supplementary data are available at Bioinformatics online.


July 7, 2019  |  

Fast-SG: an alignment-free algorithm for hybrid assembly.

Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes.Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878).Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.


July 7, 2019  |  

Clustering of circular consensus sequences: accurate error correction and assembly of single molecule real-time reads from multiplexed amplicon libraries.

Targeted resequencing with high-throughput sequencing (HTS) platforms can be used to efficiently interrogate the genomes of large numbers of individuals. A critical issue for research and applications using HTS data, especially from long-read platforms, is error in base calling arising from technological limits and bioinformatic algorithms. We found that the community standard long amplicon analysis (LAA) module from Pacific Biosciences is prone to substantial bioinformatic errors that raise concerns about findings based on this pipeline, prompting the need for a new method.A single molecule real-time (SMRT) sequencing-error correction and assembly pipeline, C3S-LAA, was developed for libraries of pooled amplicons. By uniquely leveraging the structure of SMRT sequence data (comprised of multiple low quality subreads from which higher quality circular consensus sequences are formed) to cluster raw reads, C3S-LAA produced accurate consensus sequences and assemblies of overlapping amplicons from single sample and multiplexed libraries. In contrast, despite read depths in excess of 100X per amplicon, the standard long amplicon analysis module from Pacific Biosciences generated unexpected numbers of amplicon sequences with substantial inaccuracies in the consensus sequences. A bootstrap analysis showed that the C3S-LAA pipeline per se was effective at removing bioinformatic sources of error, but in rare cases a read depth of nearly 400X was not sufficient to overcome minor but systematic errors inherent to amplification or sequencing.C3S-LAA uses a divide and conquer processing algorithm for SMRT amplicon-sequence data that generates accurate consensus sequences and local sequence assemblies. Solving the confounding bioinformatic source of error in LAA allowed for the identification of limited instances of errors due to DNA amplification or sequencing of homopolymeric nucleotide tracts. For research and development in genomics, C3S-LAA allows meaningful conclusions and biological inferences to be made from accurately polished sequence output.


July 7, 2019  |  

BMScan: using whole genome similarity to rapidly and accurately identify bacterial meningitis causing species.

Bacterial meningitis is a life-threatening infection that remains a public health concern. Bacterial meningitis is commonly caused by the following species: Neisseria meningitidis, Streptococcus pneumoniae, Listeria monocytogenes, Haemophilus influenzae and Escherichia coli. Here, we describe BMScan (Bacterial Meningitis Scan), a whole-genome analysis tool for the species identification of bacterial meningitis-causing and closely-related pathogens, an essential step for case management and disease surveillance. BMScan relies on a reference collection that contains genomes for 17 focal species to scan against to identify a given species. We established this reference collection by supplementing publically available genomes from RefSeq with genomes from the isolate collections of the Centers for Disease Control Bacterial Meningitis Laboratory and the Minnesota Department of Health Public Health Laboratory, and then filtered them down to a representative set of genomes which capture the diversity for each species. Using this reference collection, we evaluated two genomic comparison algorithms, Mash and Average Nucleotide Identity, for their ability to accurately and rapidly identify our focal species.We found that the results of Mash were strongly correlated with the results of ANI for species identification, while providing a significant reduction in run-time. This drastic difference in run-time enabled the rapid scanning of large reference genome collections, which, when combined with species-specific threshold values, facilitated the development of BMScan. Using a validation set of 15,503 genomes of our species of interest, BMScan accurately identified 99.97% of the species within 16 min 47 s.Identification of the bacterial meningitis pathogenic species is a critical step for case confirmation and further strain characterization. BMScan employs species-specific thresholds for previously-validated, genome-wide similarity statistics compiled from a curated reference genome collection to rapidly and accurately identify the species of uncharacterized bacterial meningitis pathogens and closely related pathogens. BMScan will facilitate the transition in public health laboratories from traditional phenotypic detection methods to whole genome sequencing based methods for species identification.


July 7, 2019  |  

STRetch: detecting and discovering pathogenic short tandem repeat expansions.

Short tandem repeat (STR) expansions have been identified as the causal DNA mutation in dozens of Mendelian diseases. Most existing tools for detecting STR variation with short reads do so within the read length and so are unable to detect the majority of pathogenic expansions. Here we present STRetch, a new genome-wide method to scan for STR expansions at all loci across the human genome. We demonstrate the use of STRetch for detecting STR expansions using short-read whole-genome sequencing data at known pathogenic loci as well as novel STR loci. STRetch is open source software, available from github.com/Oshlack/STRetch .


July 7, 2019  |  

Near- complete genome sequences of Streptomyces sp. strains AC1-42T and AC1-42W, isolated from bat guano from Cabalyorisa Cave, Mabini, Pangasinan, Philippines.

Streptomyces sp. strains AC1-42T and AC1-42W, isolated from bat guano from Cabalyorisa Cave, Mabini, Pangasinan, Philippines, are active against Bacillus subtilis subsp. subtilis KCTC 3135T. The near-complete genome sequences reported here represent a possible source of ribosomally synthesized, posttranslationally mod- ified peptides, such as lantipeptides, bacteriocins, linaridin, and a lasso peptide.


July 7, 2019  |  

Genome sequence of Halomonas hydrothermalis Y2, an efficient ectoine-producer isolated from pulp mill wastewater.

Halophilic microorganisms have great potentials towards biotechnological applications. Halomonas hydrothermalis Y2 is a halotolerant and alkaliphilic strain that isolated from the Na+-rich pulp mill wastewater. The strain is dominant in the bacterial community of pulp mill wastewater and exhibits metabolic diversity in utilizing various substrates. Here we present the genome sequence of this strain, which comprises a circular chromosome 3,933,432 bp in size and a GC content of 60.2%. Diverse genes that encoding proteins for compatible solutes synthesis and transport were identified from the genome. With a complete pathway for ectoine synthesis, the strain could produce ectoine from monosodium glutamate and further partially secreted into the medium. In addition, around 20% ectoine was increased by deleting the ectoine hydroxylase (EctD). The genome sequence we report here will provide genetic information regarding adaptive mechanisms of strain Y2 to its harsh habitat, as well as facilitate exploration of metabolic strategies for diverse compatible solutes, e.g., ectoine production. Copyright © 2018 Elsevier B.V. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.