Menu
July 7, 2019

An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics.

Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae, Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.© 2017 Omasits et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Genomic comparison between Staphylococcus aureus GN strains clinically isolated from a familial infection case: IS1272 transposition through a novel inverted repeat-replacing mechanism.

A bacterial insertion sequence (IS) is a mobile DNA sequence carrying only the transposase gene (tnp) that acts as a mutator to disrupt genes, alter gene expressions, and cause genomic rearrangements. “Canonical” ISs have historically been characterized by their terminal inverted repeats (IRs), which may form a stem-loop structure, and duplications of a short (non-IR) target sequence at both ends, called target site duplications (TSDs). The IS distributions and virulence potentials of Staphylococcus aureus genomes in familial infection cases are unclear. Here, we determined the complete circular genome sequences of familial strains from a Panton-Valentine leukocidin (PVL)-positive ST50/agr4 S. aureus (GN) infection of a 4-year old boy with skin abscesses. The genomes of the patient strain (GN1) and parent strain (GN3) were rich for “canonical” IS1272 with terminal IRs, both having 13 commonly-existing copies (ce-IS1272). Moreover, GN1 had a newly-inserted IS1272 (ni-IS1272) on the PVL-converting prophage, while GN3 had two copies of ni-IS1272 within the DNA helicase gene and near rot. The GN3 genome also had a small deletion. The targets of ni-IS1272 transposition were IR structures, in contrast with previous “canonical” ISs. There were no TSDs. Based on a database search, the targets for ce-IS1272 were IRs or “non-IRs”. IS1272 included a larger structure with tandem duplications of the left (IRL) side sequence; tnp included minor cases of a long fusion form and truncated form. One ce-IS1272 was associated with the segments responsible for immune evasion and drug resistance. Regarding virulence, GN1 expressed cytolytic peptides (phenol-soluble modulin a and d-hemolysin) and PVL more strongly than some other familial strains. These results suggest that IS1272 transposes through an IR-replacing mechanism, with an irreversible process unlike that of “canonical” transpositions, resulting in genomic variations, and that, among the familial strains, the patient strain has strong virulence potential based on community-associated virulence factors.


July 7, 2019

Complete genome sequencing of Arachidicoccus ginsenosidimutans sp. nov., and its application for production of minor ginsenosides by finding a novel ginsenoside-transforming beta-glucosidase

A novel bacterial strain (BS20T), which has ginsenoside-transforming ability, was whole genome sequenced for the identification of a target gene. After complete genome sequencing, phylogenetic, phenotypic and chemotaxonomic analyses, the strain BS20T (Arachidicoccus ginsenosidimutans sp. nov.) was placed within the genus Arachidicoccus of family Chitinophagaceae. The complete genome of strain BS20T comprised a circular chromosome of 4[thin space (1/6-em)]138[thin space (1/6-em)]017 bp. To find the target functional gene, 17 sets of four different glycoside hydrolases were cloned in E. coli BL21 (DE3) using the pGEX4T-1 vector and were characterized. Among these 17 sets of clones, only one, BglAg-762, exhibited ginsenoside-conversion ability. The BglAg-762 comprised 762 amino acid residues and belonged to the glycoside hydrolase family 3. The recombinant enzyme (GST-BglAg-762) was able to convert major ginsenosides Rb1 to F2 via gypenoside-XVII (Gyp-XVII), Rb2 to C-O, and Rb3, Rc, Rd, and Gyp-XVII to C-Mx1, C-Mc1, and F2, respectively. Finally, ginsenoside F2 was transformed into compound K (C-K). Besides, these pilot data demonstrate the identification of 17 sets of target/functional genes of 4 different glycoside hydrolases from a novel bacterial species via whole genome sequencing. Our results have shown that the recombinant BglAg-762 very quickly converts the major ginsenosides into minor ginsenosides, which can be used for the enhanced production of target minor ginsenosides. Furthermore, the web service of NCBI is suitable for any targeted gene identification, but based on our experimental analysis we concluded that the hypothetical protein present in NCBI should be considered as a putative or uncharacterized protein.


July 7, 2019

Complete genome sequence of Spirosoma rigui KCTC 12531 T, a bacterium isolated from fresh water from the Woopo wetland for taxonomic study

Spirosoma rigui KCTC 12531T was isolated from fresh water from the Woopo wetland, Korea. In this study, we report the complete genome sequence of a bacterium Spirosoma rigui KCTC 12531T, its complete genome sequence was obtained using the PacBio RS II platform. The genome comprised of 5,828,404 bp with the G + C content of 54.4%, the genome included 4,774 genes were predicted, among them, 4,647 genes are protein-coding genes.


July 7, 2019

Complete genome sequence of Acinetobacter baumannii A1296 (ST1469) with a small plasmid harbouring the tet(39) tetracycline resistance gene.

Acinetobacter baumannii is considered an important nosocomial pathogen worldwide owing to its increasing antibiotic resistance. This study aimed to determine the complete genome sequence of A. baumannii strain A1296 and to perform a comparative analysis among A. baumannii.The complete genome sequence of A. baumannii A1296 was sequenced on two SMRT cells using P6C4 chemistry on a PacBio Single Molecule, Real-Time (SMRT) RS II instrument. The A1296 genome sequence was annotated using Prokaryotic Genome Automatic Annotation Pipeline (PGAAP), and the sequence type and resistance genes of the strain were analysed.Here we present the complete genome sequence of A. baumannii strain A1296, belonging to a novel sequence type (ST1469) and isolated from patient in China, that was sensitive to multiple antibiotics. The genome of A. baumannii A1296 was 3810701bp in length, including one circular chromosome and two plasmids. The tet(39) resistance gene was located on the small plasmid in this A. baumannii strain.The genome sequence of A. baumannii strain A1296 can be used as a reference sequence for comparative analysis aimed at elucidating the acquisition, dissemination and mobilisation of resistance genes among A. baumannii. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.


July 7, 2019

Draft genomes of the fungal pathogen Phellinus noxius in Hong Kong

The fungal pathogen Phellinus noxius is the underlying cause of brown root rot, a disease with causing tree mortality globally, causing extensive damage in urban areas and crop plants. This disease currently has no cure, and despite the global epidemic, little is known about the pathogenesis and virulence of this pathogen. Using Ion Torrent PGM, Illumina MiSeq and PacBio RSII sequencing platforms with various genome assembly methods, we produced the draft genome sequences of four P. noxius strains isolated from infected trees in Hong Kong to further understand the pathogen and identify the mechanisms behind the aggressive nature and virulence of this fungus. The resulting genomes ranged from 30.8Mb to 31.8Mb in size, and of the four sequences, the YTM97 strain was chosen to produce a high-quality Hong Kong strain genome sequence, resulting in a 31Mb final assembly with 457 scaffolds, an N50 length of 275,889 bp and 96.2% genome completeness. RNA-seq of YTM97 using Illumina HiSeq400 was performed for improved gene prediction. AUGUSTUS and Genemark-ES prediction programs predicted 9,887 protein-coding genes which were annotated using GO and Pfam databases. The encoded carbohydrate active enzymes revealed large numbers of lignolytic enzymes present, comparable to those of other white-rot plant pathogens. In addition, P. noxius also possessed larger numbers of cellulose, xylan and hemicellulose degrading enzymes than other plant pathogens. Searches for virulence genes was also performed using PHI-Base and DFVF databases revealing a host of virulence-related genes and effectors. The combination of non-specific host range, unique carbohydrate active enzyme profile and large amount of putative virulence genes could explain the reasons behind the aggressive nature and increased virulence of this plant pathogen. The draft genome sequences presented here will provide references for strains found in Hong Kong. Together with emerging research, this information could be used for genetic diversity and epidemiology research on a global scale as well as expediting our efforts towards discovering the mechanisms of pathogenicity of this devastating pathogen.


July 7, 2019

Detection of complex structural variation from paired-end sequencing data

Detecting structural variants (SVs) from sequencing data is a key problem in genome analysis, but the full diversity of SVs is not captured by most methods. We introduce the Automated Reconstruction of Complex Structural Variants (ARC-SV) method, which detects a broad class of structural variants from paired-end whole genome sequencing (WGS) data. Analysis of samples from NA12878 and HuRef suggests that complex SVs are often misclassified by traditional methods. We validated our results both experimentally and by comparison to whole genome assembly and PacBio data; ARC-SV compares favorably to existing algorithms in general and gives state-of-the-art results on complex SV detection. By expanding the range of detectable SVs compared to commonly-used algorithms, ARC-SV allows additional information to be extracted from existing WGS data.


July 7, 2019

Dissemination and characteristics of a novel plasmid-encoded carbapenem-hydrolyzing class D beta-lactamase, OXA-436 from four patients involving six different hospitals in Denmark.

The diversity of OXA-48-like carbapenemases is continually expanding. In this study, we describe the dissemination and characteristics of a novel carbapenem-hydrolyzing class D carbapenemase (CHDL) named OXA-436. In total, six OXA-436-producing Enterobacteriaceae isolates including Enterobacter asburiae (n=3), Citrobacter freundii (n=2) and Klebsiella pneumoniae (n=1) were identified in four patients in the period between September 2013 and April 2015. All three species of OXA-436-producing Enterobacteriaceae were found in one patient. The amino acid sequence of OXA-436 showed 90.4-92.8% identity to other acquired OXA-48-like variants. Expression of OXA-436 in Escherichia coli and kinetic analysis of purified OXA-436 revealed an activity profile similar to OXA-48 and OXA-181 with activity against penicillins including temocillin, limited or no activity against extended-spectrum cephalosporins and activity against carbapenems. The blaOXA-436 gene was located on a conjugative ~314 kb IncHI2/IncHI2A plasmid belonging to pMLST ST1, in a region surrounded by chromosomal genes previously identified adjacent to blaOXA-genes in Shewanella spp. In conclusion, OXA-436 is a novel CHDL with similar functional properties as OXA-48-like CHDLs. The described geographical spread among different Enterobacteriaceae and plasmid location of blaOXA-436 illustrates its potential for further dissemination. Copyright © 2017 American Society for Microbiology.


July 7, 2019

Copy number variation probes inform diverse applications

A major contributor to inter-individual genomic variability is copy number variation (CNV). CNVs change the diploid status of the DNA, involve one or multiple genes, and may disrupt coding regions, affect regulatory elements, or change gene dosage. While some of these changes may have no phenotypic consequences, others underlie disease, explain evolutionary processes, or impact the response to medication.


July 7, 2019

Highly accurate fluorogenic DNA sequencing with information theory-based error correction.

Eliminating errors in next-generation DNA sequencing has proved challenging. Here we present error-correction code (ECC) sequencing, a method to greatly improve sequencing accuracy by combining fluorogenic sequencing-by-synthesis (SBS) with an information theory-based error-correction algorithm. ECC embeds redundancy in sequencing reads by creating three orthogonal degenerate sequences, generated by alternate dual-base reactions. This is similar to encoding and decoding strategies that have proved effective in detecting and correcting errors in information communication and storage. We show that, when combined with a fluorogenic SBS chemistry with raw accuracy of 98.1%, ECC sequencing provides single-end, error-free sequences up to 200 bp. ECC approaches should enable accurate identification of extremely rare genomic variations in various applications in biology and medicine.


July 7, 2019

N6-adenine DNA methylation is associated with the linker DNA of H2A.Z-containing well-positioned nucleosomes in Pol II-transcribed genes in Tetrahymena.

DNA N6-methyladenine (6mA) is newly rediscovered as a potential epigenetic mark across a more diverse range of eukaryotes than previously realized. As a unicellular model organism, Tetrahymena thermophila is among the first eukaryotes reported to contain 6mA modification. However, lack of comprehensive information about 6mA distribution hinders further investigations into its function and regulatory mechanism. In this study, we provide the first genome-wide, base pair-resolution map of 6mA in Tetrahymena by applying single-molecule real-time (SMRT) sequencing. We provide evidence that 6mA occurs mostly in the AT motif of the linker DNA regions. More strikingly, these linker DNA regions with 6mA are usually flanked by well-positioned nucleosomes and/or H2A.Z-containing nucleosomes. We also find that 6mA is exclusively associated with RNA polymerase II (Pol II)-transcribed genes, but is not an unambiguous mark for active transcription. These results support that 6mA is an integral part of the chromatin landscape shaped by adenosine triphosphate (ATP)-dependent chromatin remodeling and transcription.© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.


July 7, 2019

Comparative and population genomic landscape of Phellinus noxius: A hypervariable fungus causing root rot in trees.

The order Hymenochaetales of white rot fungi contain some of the most aggressive wood decayers causing tree deaths around the world. Despite their ecological importance and the impact of diseases they cause, little is known about the evolution and transmission patterns of these pathogens. Here, we sequenced and undertook comparative genomic analyses of Hymenochaetales genomes using brown root rot fungus Phellinus noxius, wood-decomposing fungus Phellinus lamaensis, laminated root rot fungus Phellinus sulphurascens and trunk pathogen Porodaedalea pini. Many gene families of lignin-degrading enzymes were identified from these fungi, reflecting their ability as white rot fungi. Comparing against distant fungi highlighted the expansion of 1,3-beta-glucan synthases in P. noxius, which may account for its fast-growing attribute. We identified 13 linkage groups conserved within Agaricomycetes, suggesting the evolution of stable karyotypes. We determined that P. noxius has a bipolar heterothallic mating system, with unusual highly expanded ~60 kb A locus as a result of accumulating gene transposition. We investigated the population genomics of 60 P. noxius isolates across multiple islands of the Asia Pacific region. Whole-genome sequencing showed this multinucleate species contains abundant poly-allelic single nucleotide polymorphisms with atypical allele frequencies. Different patterns of intra-isolate polymorphism reflect mono-/heterokaryotic states which are both prevalent in nature. We have shown two genetically separated lineages with one spanning across many islands despite the geographical barriers. Both populations possess extraordinary genetic diversity and show contrasting evolutionary scenarios. These results provide a framework to further investigate the genetic basis underlying the fitness and virulence of white rot fungi.© 2017 John Wiley & Sons Ltd.


July 7, 2019

Complete genome sequence of Acidihalobacter prosperus strain F5, an extremely acidophilic, iron- and sulfur-oxidizing halophile with potential industrial applicability in saline water bioleaching of chalcopyrite.

Successful process development for the bioleaching of mineral ores, particularly the refractory copper sulfide ore chalcopyrite, remains a challenge in regions where freshwater is scarce and source water contains high concentrations of chloride ion. In this study, a pure isolate of Acidihalobacter prosperus strain F5 was characterized for its ability to leach base metals from sulfide ores (pyrite, chalcopyrite and pentlandite) at increasing chloride ion concentrations. F5 successfully released base metals from ores including pyrite and pentlandite at up to 30gL(-1) chloride ion and chalcopyrite up to 18gL(-1) chloride ion. In order to understand the genetic mechanisms of tolerance to high acid, saline and heavy metal stress the genome of F5 was sequenced and analysed. As well as being the first strain of Ac. prosperus to be isolated from Australia it is also the first complete genome of the Ac. prosperus species to be sequenced. The F5 genome contains genes involved in the biosynthesis of compatible solutes and genes encoding monovalent cation/proton antiporters and heavy metal transporters which could explain its abilities to tolerate high salinity, acidity and heavy metal stress. Genome analysis also confirmed the presence of genes involved in copper tolerance. The study demonstrates the potential biotechnological applicability of Ac. prosperus strain F5 for saline water bioleaching of mineral ores. Copyright © 2017 Elsevier B.V. All rights reserved.


July 7, 2019

Complete genome sequence of multidrug-resistant Staphylococcus sciuri strain SNUDS-18 isolated from a farmed duck in South Korea.

This study aimed to determine the complete genome sequence of multidrug-resistant Staphylococcus sciuri strain SNUDS-18 isolated from a farmed duck in South Korea.Genomic DNA was sequenced using a PacBio RS II system. The obtained genome was annotated and antimicrobial resistance and virulence genes were identified.The sequenced genome possessed a mecA homologue (mecA1) that was almost identical to that of other oxacillin-susceptible S. sciuri strains, whereas the staphylococcal cassette chromosome mec (SCCmec) was not detected. Moreover, various antimicrobial resistance genes conferring resistance to ß-lactams, aminoglycosides, phenicols, tetracycline and macrolide-lincosamide-streptogramin B (MLSB) antimicrobials were identified.The SNUDS-18 genome and its associated genomic data will provide important insights into the biodiversity of the S. sciuri group as well as valuable information for the control of this potential pathogen. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.


July 7, 2019

Filling reference gaps via assembling DNA barcodes using high-throughput sequencing-moving toward barcoding the world.

Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)-based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn’t show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes.© The Authors 2017. Published by Oxford University Press.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.