Menu
July 7, 2019

Use of single molecule sequencing for comparative genomics of an environmental and a clinical isolate of Clostridium difficile ribotype 078.

How the pathogen Clostridium difficile might survive, evolve and be transferred between reservoirs within the natural environment is poorly understood. Some ribotypes are found both in clinical and environmental settings. Whether these strains are distinct from each another and evolve in the specific environments is not established. The possession of a highly mobile genome has contributed to the genetic diversity and ongoing evolution of C. difficile. Interpretations of genetic diversity have been limited by fragmented assemblies resulting from short-read length sequencing approaches and by a limited understanding of epigenetic regulation of diversity. To address this, single molecule real time (SMRT) sequencing was used in this study as it produces high quality genome sequences, with resolution of repeat regions (including those found in mobile elements) and can generate data to determine methylation modifications across the sequence (the methylome).Chromosomal rearrangements and ribosomal operon duplications were observed in both genomes. The rearrangements occurred at insertion sites within two mobile genetic elements (MGEs), Tn6164 and Tn6293, present only in the M120 and CD105HS27 genomes, respectively. The gene content of these two transposons differ considerably which could impact upon horizontal gene transfer; differences include CDSs encoding methylases and a conjugative prophage only in Tn6164. To investigate mechanisms which could affect MGE transfer, the methylome, restriction modification (RM)  and the CRISPR/Cas systems were characterised for each strain. Notably, the environmental isolate, CD105HS27, does not share a consensus motif for (m4)C methylation, but has one additional spacer  when compared to the clinical isolate M120.These findings show key differences between the two strains in terms of their genetic capacity for MGE transfer. The carriage of horizontally transferred genes appear to have genome wide effects based on two different methylation patterns. The CRISPR/Cas system appears active although perhaps slow to evolve. Data suggests that both mechanisms are functional and impact upon horizontal gene transfer and genome evolution within C. difficile.


July 7, 2019

Complete genome sequence of Rothia aeria type strain JCM 11412, isolated from air in the Russian space laboratory Mir.

Here, we present the complete genome sequence of Rothia aeria type strain JCM 11412, isolated from air in the Russian space laboratory Mir. Recently, there has been an increasing number of reports on infections caused by R. aeria The genomic information will enable researchers to identify the pathogenicity of this organism. Copyright © 2016 Nambu et al.


July 7, 2019

Complete genome sequence of Streptococcus sp. strain NPS 308.

Streptococcus sp. strain NPS 308, isolated from an 8-year-old girl diagnosed with infective endocarditis, likely presents a novel species of Streptococcus Here, we present a complete genome sequence of this species, which will contribute to better understanding of the pathogenesis of infective endocarditis. Copyright © 2016 Kondo et al.


July 7, 2019

Whole genome sequence and comparative genomics of the novel Lyme borreliosis causing pathogen, Borrelia mayonii.

Borrelia mayonii, a Borrelia burgdorferi sensu lato (Bbsl) genospecies, was recently identified as a cause of Lyme borreliosis (LB) among patients from the upper midwestern United States. By microscopy and PCR, spirochete/genome loads in infected patients were estimated at 105 to 106 per milliliter of blood. Here, we present the full chromosome and plasmid sequences of two B. mayonii isolates, MN14-1420 and MN14-1539, cultured from blood of two of these patients. Whole genome sequencing and assembly was conducted using PacBio long read sequencing (Pacific Biosciences RSII instrument) followed by hierarchical genome-assembly process (HGAP). The B. mayonii genome is ~1.31 Mbp in size (26.9% average GC content) and is comprised of a linear chromosome, 8 linear and 7 circular plasmids. Consistent with its taxonomic designation as a new Bbsl genospecies, the B. mayonii linear chromosome shares only 93.83% average nucleotide identity with other genospecies. Both B. mayonii genomes contain plasmids similar to B. burgdorferi sensu stricto lp54, lp36, lp28-3, lp28-4, lp25, lp17, lp5, 5 cp32s, cp26, and cp9. The vls locus present on lp28-10 of B. mayonii MN14-1420 is remarkably long, being comprised of 24 silent vls cassettes. Genetic differences between the two B. mayonii genomes are limited and include 15 single nucleotide variations as well as 7 fewer silent vls cassettes and a lack of the lp5 plasmid in MN14-1539. Notably, 68 homologs to proteins present in B. burgdorferi sensu stricto appear to be lacking from the B. mayonii genomes. These include the complement inhibitor, CspZ (BB_H06), the fibronectin binding protein, BB_K32, as well as multiple lipoproteins and proteins of unknown function. This study shows the utility of long read sequencing for full genome assembly of Bbsl genomes, identifies putative genome regions of B. mayonii that may be linked to clinical manifestation or tissue tropism, and provides a valuable resource for pathogenicity, diagnostic and vaccine studies.


July 7, 2019

Genome sequence and comparative pathogenic determinants of multidrug resistant uropathogenic Escherichia coli O25b: H4, A clinical isolate from Saudi Arabia

Escherichia coli serotype O25b:H4 is involved in human urinary tract infections.In this study, we sequenced and analyzed E. coli O25b:H4 isolated from a patient sufferingfrom recurring UTI infections in an intensive care unit at Hera General Hospital inMakkah, Saudi Arabia. We aimed to determine the virulence genes for pathogenesis anddrug resistance of this isolate compared to other E. coli strains. We sequenced and analyzedthe E. coli O25b:H4 Saudi strain clinical isolate using next generation sequencing. Usingthe ERGO genome analysis platform, we performed annotations and identified virulenceand antibiotic resistance determinants of this clinical isolate. The E. coli O25b:H4 genomewas assembled into four contigs representing a total chromosome size of 5.28 Mb, andthree contigs were identified, including a 130.9 kb (virulence plasmid) contig bearing thebla-CTX gene and 32 kb and 29 kb contigs. In comparing this genome to otheruropathogenic E. coli genomes, we identified unique drug resistance and pathogenicityfactors. In this work, whole-genome sequencing and targeted comparative analysis of aclinical isolate of uropathogenic Escherichia coli O25b:H4 was performed. This strainencodes virulence genes linked with extraintestinal pathogenic E. coli (ExPEC) that areexpressed constitutively in E. coli ST131. We identified the genes responsible forpathogenesis and drug resistance and performed comparative analyses of the virulenceand antibiotic resistance determinants with those of other E. coli UPEC isolates. This isthe first report of genome sequencing and analysis of a UPEC strain from Saudi Arabia.


July 7, 2019

First complete genome sequence of a subdivision 6 Acidobacterium strain.

Although ubiquitous and abundant in soils, acidobacteria have mostly escaped isolation and remain poorly investigated. Only a few cultured representatives and just eight genomes of subdivisions 1, 3, and 4 are available to date. Here, we determined the complete genome sequence of strain HEG_-6_39, the first genome of Acidobacterium subdivision 6. Copyright © 2016 Huang et al.


July 7, 2019

Improved assembly of noisy long reads by k-mer validation.

Genome assembly depends critically on read length. Two recent technologies, from Pacific Biosciences (PacBio) and Oxford Nanopore, produce read lengths >20 kb, which yield de novo genome assemblies with vastly greater contiguity than those based on Sanger, Illumina, or other technologies. However, the very high error rates of these two new technologies (~15% per base) makes assembly imprecise at repeats longer than the read length and computationally expensive. Here we show that the contiguity and quality of the assembly of these noisy long reads can be significantly improved at a minimal cost, by leveraging on the low error rate and low cost of Illumina short reads. Namely, k-mers from the PacBio raw reads that are not present in Illumina reads (which account for ~95% of the distinct k-mers) are deemed sequencing errors and ignored at the seed alignment step. By focusing on the ~5% of k-mers that are error free, read overlap sensitivity is dramatically increased. Of equal importance, the validation procedure can be extended to exclude repetitive k-mers, which prevents read miscorrection at repeats and further improves the resulting assemblies. We tested the k-mer validation procedure using one long-read technology (PacBio) and one assembler (MHAP/Celera Assembler), but it is very likely to yield analogous improvements with alternative long-read technologies and assemblers, such as Oxford Nanopore and BLASR/DALIGNER/Falcon, respectively.© 2016 Carvalho et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Jabba: hybrid error correction for long sequencing reads.

Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned.In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented.Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.


July 7, 2019

Identification of the fluvirucin B2 (Sch 38518) biosynthetic gene cluster from Actinomadura fulva subsp. indica ATCC 53714: substrate specificity of the ß-amino acid selective adenylating enzyme FlvN.

Fluvirucins are 14-membered macrolactam polyketides that show antifungal and antivirus activities. Fluvirucins have the ß-alanine starter unit at their polyketide skeletons. To understand the construction mechanism of the ß-alanine moiety in fluvirucin biosyntheses, we have identified the biosynthetic cluster of fluvirucin B2 produced from Actinomadura fulva subsp. indica ATCC 53714. The identified gene cluster contains three polyketide synthases, four characteristic ß-amino acid-carrying enzymes, one decarboxylase, and one amidohydrolase. We next investigated the activity of the adenylation enzyme FlvN, which is a key enzyme for the selective incorporation of a ß-amino acid substrate. FlvN showed strong preference for l-aspartate over other amino acids such as ß-alanine. Based on these results, we propose a biosynthetic pathway for fluvirucin B2.


July 7, 2019

A simple thermoplastic substrate containing hierarchical silica lamellae for high-molecular-weight DNA extraction.

An inexpensive, magnetic thermoplastic nanomaterial is developed utilizing a hierarchical layering of micro- and nanoscale silica lamellae to create a high-surface-area and low-shear substrate capable of capturing vast amounts of ultrahigh-molecular-weight DNA. Extraction is performed via a simple 45 min process and is capable of achieving binding capacities up to 1 000 000 times greater than silica microparticles. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.


July 7, 2019

Improve homology search sensitivity of PacBio data by correcting frameshifts.

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data.In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing.The source code is freely available at https://sourceforge.net/projects/frame-pro/yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Information-optimal genome assembly via sparse read-overlap graphs.

In the context of third-generation long-read sequencing technologies, read-overlap-based approaches are expected to play a central role in the assembly step. A fundamental challenge in assembling from a read-overlap graph is that the true sequence corresponds to a Hamiltonian path on the graph, and, under most formulations, the assembly problem becomes NP-hard, restricting practical approaches to heuristics. In this work, we avoid this seemingly fundamental barrier by first setting the computational complexity issue aside, and seeking an algorithm that targets information limits In particular, we consider a basic feasibility question: when does the set of reads contain enough information to allow unambiguous reconstruction of the true sequence?Based on insights from this information feasibility question, we present an algorithm-the Not-So-Greedy algorithm-to construct a sparse read-overlap graph. Unlike most other assembly algorithms, Not-So-Greedy comes with a performance guarantee: whenever information feasibility conditions are satisfied, the algorithm reduces the assembly problem to an Eulerian path problem on the resulting graph, and can thus be solved in linear time. In practice, this theoretical guarantee translates into assemblies of higher quality. Evaluations on both simulated reads from real genomes and a PacBio Escherichia coli K12 dataset demonstrate that Not-So-Greedy compares favorably with standard string graph approaches in terms of accuracy of the resulting read-overlap graph and contig N50.Available at github.com/samhykim/nsgcourtade@eecs.berkeley.edu or dntse@stanford.eduSupplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

TeloPCR-seq: a high-throughput sequencing approach for telomeres.

We have developed a high-throughput sequencing approach that enables us to determine terminal telomere sequences from tens of thousands of individual Schizosaccharomyces pombe telomeres. This method provides unprecedented coverage of telomeric sequence complexity in fission yeast. S. pombe telomeres are composed of modular degenerate repeats that can be explained by variation in usage of the TER1 RNA template during reverse transcription. Taking advantage of this deep sequencing approach, we find that ‘like’ repeat modules are highly correlated within individual telomeres. Moreover, repeat module preference varies with telomere length, suggesting that existing repeats promote the incorporation of like repeats and/or that specific conformations of the telomerase holoenzyme efficiently and/or processively add repeats of like nature. After the loss of telomerase activity, this sequencing and analysis pipeline defines a population of telomeres with altered sequence content. This approach will be adaptable to study telomeric repeats in other organisms and also to interrogate repetitive sequences throughout the genome that are inaccessible to other sequencing methods.© 2016 Federation of European Biochemical Societies.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.