With PacBio Archives - Page 17 of 19

July 7, 2019

BAC-pool sequencing and analysis confirms growth-associated QTLs in the Asian seabass genome.

The Asian seabass is an important marine food fish that has been cultured for several decades in Asia Pacific. However, the lack of a high quality reference genome has hampered efforts to improve its selective breeding. A 3D BAC pool set generated in this study was screened using 22 SSR markers located on linkage group 2 which contains a growth-related QTL region. Seventy-two clones corresponding to 22 FPC contigs were sequenced by Illumina MiSeq technology. We co-assembled the MiSeq-derived scaffolds from each FPC contig with error-corrected PacBio reads, resulting in 187 sequences covering 9.7?Mb. Eleven genes annotated within this region were found to be potentially associated with growth and their tissue-specific expression was investigated. Correlation analysis demonstrated that SNPs in ctsb, skp1 and ppp2ca can be potentially used as markers for selecting fast-growing fingerlings. Conserved syntenies between seabass LG2 and five other teleosts were identified. This study i) provided a 10?Mb targeted genome assembly; ii) demonstrated NGS of BAC pools as a potential approach for mining candidates underlying QTLs of this species; iii) detected eleven genes potentially responsible for growth in the QTL region; and iv) identified useful SNP markers for selective breeding programs of Asian seabass.

July 7, 2019

LongISLND: in silico sequencing of lengthy and noisy datatypes.

LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling.LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd CONTACT: hugo.lam@roche.comSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press.

July 7, 2019

Draft genome sequences of five novel polyketide synthetase-containing mouse Escherichia coli strains.

We report herein the draft genomes of five novel Escherichia coli strains isolated from surveillance and experimental mice housed at MIT and the Whitehead Institute and describe their genomic characteristics in context with the polyketide synthetase (PKS)-containing pathogenic E. coli strains NC101, IHE3034, and A192PP. Copyright © 2016 Mannion et al.

July 7, 2019

Comparative genomics and physiology of the butyrate-producing bacterium Intestinimonas butyriciproducens.

Intestinimonas is a newly described bacterial genus with representative strains present in the intestinal tract of human and other animals. Despite unique metabolic features including the production of butyrate from both sugars and amino acids, there is to date no data on their diversity, ecology, and physiology. Using a comprehensive phylogenetic approach, Intestinimomas was found to include at least three species that colonize primarily the human and mouse intestine. We focused on the most common and cultivable species of the genus, Intestinimonas butyriciproducens, and performed detailed genomic and physiological comparison of strains SRB521(T) and AF211, isolated from the mouse and human gut respectively. The complete 3.3-Mb genomic sequences of both strains were highly similar with 98.8% average nucleotide identity, testifying to their assignment to one single species. However, thorough analysis revealed significant genomic rearrangements, variations in phage-derived sequences, and the presence of new CRISPR sequences in both strains. Moreover, strain AF211 appeared to be more efficient than strain SRB521(T) in the conversion of the sugars arabinose and galactose. In conclusion, this study provides genomic and physiological insight into Intestinimonas butyriciproducens, a prevalent butyrate-producing species, differentiating strains that originate from the mouse and human gut.© 2016 The Authors. Environmental Microbiology Reports published by Society for Applied Microbiology and JohnWiley & Sons Ltd.

July 7, 2019

Genomic studies of nitrogen-fixing rhizobial strains from Phaseolus vulgaris seeds and nodules.

Rhizobia are soil bacteria that establish symbiotic relationships with legumes and fix nitrogen in root nodules. We recently reported that several nitrogen-fixing rhizobial strains, belonging to Rhizobium phaseoli, R. trifolii, R. grahamii and Sinorhizobium americanum, were able to colonize Phaseolus vulgaris (common bean) seeds. To gain further insight into the traits that support this ability, we analyzed the genomic sequences and proteomes of R. phaseoli (CCGM1) and S. americanum (CCGM7) strains from seeds and compared them with those of the closely related strains CIAT652 and CFNEI73, respectively, isolated only from nodules.In a fine structural study of the S. americanum genomes, the chromosomes, megaplasmids and symbiotic plasmids were highly conserved and syntenic, with the exception of the smaller plasmid, which appeared unrelated. The symbiotic tract of CCGM7 appeared more disperse, possibly due to the action of transposases. The chromosomes of seed strains had less transposases and strain-specific genes. The seed strains CCGM1 and CCGM7 shared about half of their genomes with their closest strains (3353 and 3472 orthologs respectively), but a large fraction of the rest also had homology with other rhizobia. They contained 315 and 204 strain-specific genes, respectively, particularly abundant in the functions of transcription, motility, energy generation and cofactor biosynthesis. The proteomes of seed and nodule strains were obtained and showed a particular profile for each of the strains. About 82 % of the proteins in the comparisons appeared similar. Forty of the most abundant proteins in each strain were identified; these proteins in seed strains were involved in stress responses and coenzyme and cofactor biosynthesis and in the nodule strains mainly in central processes. Only 3 % of the abundant proteins had hypothetical functions.Functions that were enriched in the genomes and proteomes of seed strains possibly participate in the successful occupancy of the new niche. The genome of the strains had features possibly related to their presence in the seeds. This study helps to understand traits of rhizobia involved in seed adaptation.

July 7, 2019

Conservation genetics of an endangered grassland butterfly (Oarisma poweshiek) reveals historically high gene flow despite recent and rapid range loss

1. In poorly dispersing species gene flow can be facilitated when suitable habitat is widespread, allowing for increased dispersal between neighbouring locations. The Poweshiek skipperling [Oarisma poweshiek (Parker)], a federally endangered butterfly, has undergone a rapid, recent demographic decline following the loss of tallgrass prairie and fen habitats range wide. The loss of habitat, now restricted geographic range, and poor dispersal ability have left O. poweshiek at increased risk of extinction. 2. We studied the population genetics of six remaining populations of O. poweshiek in order to test the hypothesis that gene flow was historically high despite limited long-distance dispersal capability. Utilising nine microsatellite loci developed by PacBio sequencing, we tested for patterns of isolation by distance, low population genetic structure and alternative gene flow models. 3. Populations from southern Manitoba, Canada to the Lower Peninsula of Michigan, USA are only weakly genetically differentiated despite having low diversity. We found no support for isolation by distance, and Bayesian estimates of historical gene flow support our hypothesis that high levels of gene flow previously connected populations from Michigan to Wisconsin. 4. Prairie grasslands have been reduced tremendously over the past century, but the low mobility of O. poweshiek suggests that rapid loss of populations over the past decade cannot be simply explained by fragmentation of habitat. 5. As a species at high risk of extinction, understanding historical processes of gene flow will allow for informed management decisions with respect to head-starting individuals for population reintroductions and for conserving networks of habitat that will allow for high levels of gene flow.

July 7, 2019

Complete genome of Vibrio parahaemolyticus FORC014 isolated from the toothfish.

Foodborne illness can occur due to various pathogenic bacteria such as Staphylococcus aureus, Escherichia coli and Vibrio parahaemolyticus, and can cause severe gastroenteritis symptoms. In this study, we completed the genome sequence of a foodborne pathogen V. parahaemolyticus FORC_014, which was isolated from suspected contaminated toothfish from South Korea. Additionally, we extended our knowledge of genomic characteristics of the FORC_014 strain through comparative analysis using the complete sequences of other V. parahaemolyticus strains whose complete genomes have previously been reported.The complete genome sequence of V. parahaemolyticus FORC_014 was generated using the PacBio RS platform with single molecule, real-time (SMRT) sequencing. The FORC_014 strain consists of two circular chromosomes (3,241,330 bp for chromosome 1 and 1,997,247 bp for chromosome 2), one plasmid (51,383 bp), and one putative phage sequence (96,896 bp). The genome contains a total of 4274 putative protein coding sequences, 126 tRNA genes and 34 rRNA genes. Furthermore, we found 33 type III secretion system 1 (T3SS1) related proteins and 15 type III secretion system 2 (T3SS2) related proteins on chromosome 1. This is the first reported result of Type III secretion system 2 located on chromosome 1 of V. parahaemolyticus without thermostable direct hemolysin (tdh) and thermostable direct hemolysin-related hemolysin (trh).Through investigation of the complete genome sequence of V. parahaemolyticus FORC_014, which differs from previously reported strains, we revealed two type III secretion systems (T3SS1, T3SS2) located on chromosome 1 which do not include tdh and trh genes. We also identified several virulence factors carried by our strain, including iron uptake system, hemolysin and secretion system. This result suggests that the FORC_014 strain may be one pathogen responsible for foodborne illness outbreak. Our results provide significant genomic clues which will assist in future understanding of virulence at the genomic level and help distinguish between clinical and non-clinical isolates.

July 7, 2019

WhatsHap: fast and accurate read-based phasing

Read-based phasing allows to reconstruct the haplotype structure of a sample purely from sequencing reads. While phasing is a required step for answering questions about population genetics, compound heterozygosity, and to aid in clinical decision making, there has been a lack of an accurate, usable and standards-based software. WhatsHap is a production-ready tool for highly accurate read-based phasing. It was designed from the beginning to leverage third-generation sequencing technologies, whose long reads can span many variants and are therefore ideal for phasing. WhatsHap works also well with second-generation data, is easy to use and will phase not only SNVs, but also indels and other variants. It is unique in its ability to combine read-based with genetic phasing, allowing to further improve accuracy if multiple related samples are provided.

July 7, 2019

Complete genome sequences of six Legionella pneumophila isolates from two collocated outbreaks of Legionnaires’ disease in 2005 and 2008 in Sarpsborg/Fredrikstad, Norway.

Here, we report the complete genome sequences of Legionella pneumophila isolates from two collocated outbreaks of Legionnaires’ disease in 2005 and 2008 in Sarpsborg/Fredrikstad, Norway. One clinical and two environmental isolates were sequenced from each outbreak. The genome of all six isolates consisted of a 3.36 Mb-chromosome, while the 2005 genomes featured an additional 68 kb-episome sharing high sequence similarity with the L. pneumophila Lens plasmid. All six genomes contained multiple mobile genetic elements including novel combinations of type-IVA secretion systems. A comparative genomics study will be launched to resolve the genetic relationship between the L. pneumophila isolates. Copyright © 2016 Dybwad et al.

July 7, 2019

Complete genome sequence of a copper-resistant bacterium from the citrus phyllosphere, Stenotrophomonas sp. strain LM091, obtained using long-read technology.

The Stenotrophomonas genus shows great adaptive potential including resistance to multiple antimicrobials, opportunistic pathogenicity, and production of numerous secondary metabolites. Using long-read technology, we report the sequence of a plant-associated Stenotrophomonas strain originating from the citrus phyllosphere that displays a copper resistance phenotype. Copyright © 2016 Richard et al.

July 7, 2019

Colib’read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads

With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools.

July 7, 2019

Improved assembly of noisy long reads by k-mer validation.

Genome assembly depends critically on read length. Two recent technologies, from Pacific Biosciences (PacBio) and Oxford Nanopore, produce read lengths >20 kb, which yield de novo genome assemblies with vastly greater contiguity than those based on Sanger, Illumina, or other technologies. However, the very high error rates of these two new technologies (~15% per base) makes assembly imprecise at repeats longer than the read length and computationally expensive. Here we show that the contiguity and quality of the assembly of these noisy long reads can be significantly improved at a minimal cost, by leveraging on the low error rate and low cost of Illumina short reads. Namely, k-mers from the PacBio raw reads that are not present in Illumina reads (which account for ~95% of the distinct k-mers) are deemed sequencing errors and ignored at the seed alignment step. By focusing on the ~5% of k-mers that are error free, read overlap sensitivity is dramatically increased. Of equal importance, the validation procedure can be extended to exclude repetitive k-mers, which prevents read miscorrection at repeats and further improves the resulting assemblies. We tested the k-mer validation procedure using one long-read technology (PacBio) and one assembler (MHAP/Celera Assembler), but it is very likely to yield analogous improvements with alternative long-read technologies and assemblers, such as Oxford Nanopore and BLASR/DALIGNER/Falcon, respectively.© 2016 Carvalho et al.; Published by Cold Spring Harbor Laboratory Press.

July 7, 2019

Genomic sequencing-based mutational enrichment analysis identifies motility genes in a genetically intractable gut microbe.

A major roadblock to understanding how microbes in the gastrointestinal tract colonize and influence the physiology of their hosts is our inability to genetically manipulate new bacterial species and experimentally assess the function of their genes. We describe the application of population-based genomic sequencing after chemical mutagenesis to map bacterial genes responsible for motility in Exiguobacterium acetylicum, a representative intestinal Firmicutes bacterium that is intractable to molecular genetic manipulation. We derived strong associations between mutations in 57 E. acetylicum genes and impaired motility. Surprisingly, less than half of these genes were annotated as motility-related based on sequence homologies. We confirmed the genetic link between individual mutations and loss of motility for several of these genes by performing a large-scale analysis of spontaneous suppressor mutations. In the process, we reannotated genes belonging to a broad family of diguanylate cyclases and phosphodiesterases to highlight their specific role in motility and assigned functions to uncharacterized genes. Furthermore, we generated isogenic strains that allowed us to establish that Exiguobacterium motility is important for the colonization of its vertebrate host. These results indicate that genetic dissection of a complex trait, functional annotation of new genes, and the generation of mutant strains to define the role of genes in complex environments can be accomplished in bacteria without the development of species-specific molecular genetic tools.

July 7, 2019

DNA extraction protocols for whole-genome sequencing in marine organisms.

The marine environment harbors a large proportion of the total biodiversity on this planet, including the majority of the earths’ different phyla and classes. Studying the genomes of marine organisms can bring interesting insights into genome evolution. Today, almost all marine organismal groups are understudied with respect to their genomes. One potential reason is that extraction of high-quality DNA in sufficient amounts is challenging for many marine species. This is due to high polysaccharide content, polyphenols and other secondary metabolites that will inhibit downstream DNA library preparations. Consequently, protocols developed for vertebrates and plants do not always perform well for invertebrates and algae. In addition, many marine species have large population sizes and, as a consequence, highly variable genomes. Thus, to facilitate the sequence read assembly process during genome sequencing, it is desirable to obtain enough DNA from a single individual, which is a challenge in many species of invertebrates and algae. Here, we present DNA extraction protocols for seven marine species (four invertebrates, two algae, and a marine yeast), optimized to provide sufficient DNA quality and yield for de novo genome sequencing projects.

July 7, 2019

Systems biology-guided biodesign of consolidated lignin conversion

Lignin is the second most abundant biopolymer on the earth, yet its utilization for fungible products is complicated by its recalcitrant nature and remains a major challenge for sustainable lignocellulosic biorefineries. In this study, we used a systems biology approach to reveal the carbon utilization pattern and lignin degradation mechanisms in a unique lignin-utilizing Pseudomonas putida strain (A514). The mechanistic study further guided the design of three functional modules to enable a consolidated lignin bioconversion route. First, P. putida A514 mobilized a dye peroxidase-based enzymatic system for lignin depolymerization. This system could be enhanced by overexpressing a secreted multifunctional dye peroxidase to promote a two-fold enhancement of cell growth on insoluble kraft lignin. Second, A514 employed a variety of peripheral and central catabolism pathways to metabolize aromatic compounds, which can be optimized by overexpressing key enzymes. Third, the ß-oxidation of fatty acid was up-regulated, whereas fatty acid synthesis was down-regulated when A514 was grown on lignin and vanillic acid. Therefore, the functional module for polyhydroxyalkanoate (PHA) production was designed to rechannel ß-oxidation products. As a result, PHA content reached 73% per cell dry weight (CDW). Further integrating the three functional modules enhanced the production of PHA from kraft lignin and biorefinery waste. Thus, this study elucidated lignin conversion mechanisms in bacteria with potential industrial implications and laid out the concept for engineering a consolidated lignin conversion route.

Auto Tag: With PacBio

BAC-pool sequencing and analysis confirms growth-associated QTLs in the Asian seabass genome.

LongISLND: in silico sequencing of lengthy and noisy datatypes.

Draft genome sequences of five novel polyketide synthetase-containing mouse Escherichia coli strains.

Comparative genomics and physiology of the butyrate-producing bacterium Intestinimonas butyriciproducens.

Genomic studies of nitrogen-fixing rhizobial strains from Phaseolus vulgaris seeds and nodules.

Conservation genetics of an endangered grassland butterfly (Oarisma poweshiek) reveals historically high gene flow despite recent and rapid range loss

Complete genome of Vibrio parahaemolyticus FORC014 isolated from the toothfish.

WhatsHap: fast and accurate read-based phasing

Complete genome sequences of six Legionella pneumophila isolates from two collocated outbreaks of Legionnaires’ disease in 2005 and 2008 in Sarpsborg/Fredrikstad, Norway.

Complete genome sequence of a copper-resistant bacterium from the citrus phyllosphere, Stenotrophomonas sp. strain LM091, obtained using long-read technology.

Colib’read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads

Improved assembly of noisy long reads by k-mer validation.

Genomic sequencing-based mutational enrichment analysis identifies motility genes in a genetically intractable gut microbe.

DNA extraction protocols for whole-genome sequencing in marine organisms.

Systems biology-guided biodesign of consolidated lignin conversion

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert