Yale Center for Genome Analysis Archives - Page 3 of 4

July 7, 2019

Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.

DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples.Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.

July 7, 2019

Complete sequence of a conjugative IncN plasmid harboring blakpc-2, blashv-12, and qnrS1 from an Escherichia coli sequence type 648 strain

We sequenced a novel conjugative blaKPC-2-harboring IncN plasmid, pYD626E, from an Escherichia coli sequence type 648 strain previously identified in Pittsburgh, Pennsylvania. pYD626E was 72,800 bp long and carried four ß-lactamase genes, blaKPC-2, blaSHV-12, blaLAP-1, and blaTEM-1. In addition, it harbored qnrS1 (fluoroquinolone resistance) and dfrA14 (trimethoprim resistance). The plasmid profile and clinical history supported the in vivo transfer of this plasmid between Klebsiella pneumoniae and Escherichia coli. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

July 7, 2019

Ferrets exclusively synthesize Neu5Ac and express naturally humanized influenza A virus receptors.

Mammals express the sialic acids N-acetylneuraminic acid (Neu5Ac) and N-glycolylneuraminic acid (Neu5Gc) on cell surfaces, where they act as receptors for pathogens, including influenza A virus (IAV). Neu5Gc is synthesized from Neu5Ac by the enzyme cytidine monophosphate-N-acetylneuraminic acid hydroxylase (CMAH). In humans, this enzyme is inactive and only Neu5Ac is produced. Ferrets are susceptible to human-adapted IAV strains and have been the dominant animal model for IAV studies. Here we show that ferrets, like humans, do not synthesize Neu5Gc. Genomic analysis reveals an ancient, nine-exon deletion in the ferret CMAH gene that is shared by the Pinnipedia and Musteloidia members of the Carnivora. Interactions between two human strains of IAV with the sialyllactose receptor (sialic acid-a2,6Gal) confirm that the type of terminal sialic acid contributes significantly to IAV receptor specificity. Our results indicate that exclusive expression of Neu5Ac contributes to the susceptibility of ferrets to human-adapted IAV strains.

July 7, 2019

Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus.

The marine cyanobacterium Prochlorococcus is the numerically dominant photosynthetic organism in the oligotrophic oceans, and a model system in marine microbial ecology. Here we report 27 new whole genome sequences (2 complete and closed; 25 of draft quality) of cultured isolates, representing five major phylogenetic clades of Prochlorococcus. The sequenced strains were isolated from diverse regions of the oceans, facilitating studies of the drivers of microbial diversity-both in the lab and in the field. To improve the utility of these genomes for comparative genomics, we also define pre-computed clusters of orthologous groups of proteins (COGs), indicating how genes are distributed among these and other publicly available Prochlorococcus genomes. These data represent a significant expansion of Prochlorococcus reference genomes that are useful for numerous applications in microbial ecology, evolution and oceanography.

July 7, 2019

Proteomic analysis of Pemphigus autoantibodies indicates a larger, more diverse, and more dynamic repertoire than determined by B cell genetics.

In autoantibody-mediated diseases such as pemphigus, serum antibodies lead to disease. Genetic analysis of B cells has allowed characterization of antibody repertoires in such diseases but would be complemented by proteomic analysis of serum autoantibodies. Here, we show using proteomic analysis that the serum autoantibody repertoire in pemphigus is much more polyclonal than that found by genetic studies of B cells. In addition, many B cells encode pemphigus autoantibodies that are not secreted into the serum. Heavy chain variable gene usage of serum autoantibodies is not shared among patients, implying targeting of the coded proteins will not be a useful therapeutic strategy. Analysis of autoantibodies in individual patients over several years indicates that many antibody clones persist but the proportion of each changes. These studies indicate a dynamic and diverse autoantibody response not revealed by genetic studies and explain why similar overall autoantibody titers may give variable disease activity. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.

July 7, 2019

Four complete Paenibacillus larvae genome sequences.

Four complete genome sequences of genetically distinct Paenibacillus larvae strains have been determined. Pacific BioSciences single-molecule real-time (SMRT) sequencing technology was used as the sole method of sequence determination and assembly. The chromosomes exhibited a G+C content of 44.1 to 44.2% and a molecular size range of 4.29 to 4.67 Mbp. Copyright © 2017 Dingman.

July 7, 2019

Draft genome sequence of Streptomyces scabrisporus NF3, an endophyte isolated from Amphipterygium adstringens.

We report the draft genome sequence of Streptomyces scabrisporus NF3, an endophyte isolated from Amphipterygium adstringens in Chiapas, Mexico. This strain produces a new modified linaridin peptide. The genome harbors at least 50 gene clusters for synthases of polyketide and nonribosomal peptides, suggesting a prospective production of various secondary metabolites. Copyright © 2017 Vazquez-Hernandez et al.

July 7, 2019

Whole-genome restriction mapping by “subhaploid”-based RAD sequencing: An efficient and flexible approach for physical mapping and genome scaffolding.

Assembly of complex genomes using short reads remains a major challenge, which usually yields highly fragmented assemblies. Generation of ultradense linkage maps is promising for anchoring such assemblies, but traditional linkage mapping methods are hindered by the infrequency and unevenness of meiotic recombination that limit attainable map resolution. Here we develop a sequencing-based “in vitro” linkage mapping approach (called RadMap), where chromosome breakage and segregation are realized by generating hundreds of “subhaploid” fosmid/bacterial-artificial-chromosome clone pools, and by restriction site-associated DNA sequencing of these clone pools to produce an ultradense whole-genome restriction map to facilitate genome scaffolding. A bootstrap-based minimum spanning tree algorithm is developed for grouping and ordering of genome-wide markers and is implemented in a user-friendly, integrated software package (AMMO). We perform extensive analyses to validate the power and accuracy of our approach in the model plant Arabidopsis thaliana and human. We also demonstrate the utility of RadMap for enhancing the contiguity of a variety of whole-genome shotgun assemblies generated using either short Illumina reads (300 bp) or long PacBio reads (6-14 kb), with up to 15-fold improvement of N50 (~816 kb-3.7 Mb) and high scaffolding accuracy (98.1-98.5%). RadMap outperforms BioNano and Hi-C when input assembly is highly fragmented (contig N50 = 54 kb). RadMap can capture wide-range contiguity information and provide an efficient and flexible tool for high-resolution physical mapping and scaffolding of highly fragmented assemblies. Copyright © 2017 Dou et al.

July 7, 2019

Draft genome sequence of the fish pathogen Flavobacterium columnare strain CSF-298-10.

We announce here the draft genome assembly of Flavobacterium columnare CSF-298-10, a strain isolated from an outbreak of columnaris disease at a commercial trout farm in Hagerman Valley, Idaho, USA. The complete genome consists of 13 contigs totaling 3,284,579 bp, with an average G+C content of 31.5% and 2,933 predicted coding genes. Copyright © 2017 Evenhuis et al.

July 7, 2019

Evolutionary dynamics of pathoadaptation revealed by three independent acquisitions of the VirB/D4 type IV secretion system in Bartonella.

The a-proteobacterial genus Bartonella comprises a group of ubiquitous mammalian pathogens that are studied as a model for the evolution of bacterial pathogenesis. Vast abundance of two particular phylogenetic lineages of Bartonella had been linked to enhanced host adaptability enabled by lineage-specific acquisition of a VirB/D4 type IV secretion system (T4SS) and parallel evolution of complex effector repertoires. However, the limited availability of genome sequences from one of those lineages as well as other, remote branches of Bartonella has so far hampered comprehensive understanding of how the VirB/D4 T4SS and its effectors called Beps have shaped Bartonella evolution. Here, we report the discovery of a third repertoire of Beps associated with the VirB/D4 T4SS of B. ancashensis, a novel human pathogen that lacks any signs of host adaptability and is only distantly related to the two species-rich lineages encoding a VirB/D4 T4SS. Furthermore, sequencing of ten new Bartonella isolates from under-sampled lineages enabled combined in silico analyses and wet lab experiments that suggest several parallel layers of functional diversification during evolution of the three Bep repertoires from a single ancestral effector. Our analyses show that the Beps of B. ancashensis share many features with the two other repertoires, but may represent a more ancestral state that has not yet unleashed the adaptive potential of such an effector set. We anticipate that the effectors of B. ancashensis will enable future studies to dissect the evolutionary history of Bartonella effectors and help unraveling the evolutionary forces underlying bacterial host adaptation.© The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

July 7, 2019

Recombination of virulence genes in divergent Acidovorax avenae strains that infect a common host.

Bacterial etiolation and decline (BED), caused by Acidovorax avenae, is an emerging disease of creeping bentgrass on golf courses in the United States. We performed the first comprehensive analysis of A. avenae on a nationwide collection of turfgrass- and maize-pathogenic A. avenae. Surprisingly, our results reveal that the turfgrass-pathogenic A. avenae in North America are not only highly divergent but also belong to two distinct phylogroups. Both phylogroups specifically infect turfgrass but are more closely related to maize pathogens than to each other. This suggests that, although the disease is only recently reported, it has likely been infecting turfgrass for a long time. To identify a genetic basis for the host specificity, we searched for genes closely related among turfgrass strains but distantly related to their homologs from maize strains. We found a cluster of 11 such genes generated by three ancient recombination events within the type III secretion system (T3SS) pathogenicity island. Ever since the recombination, the cluster has been conserved by strong purifying selection, hinting at its selective importance. Together our analyses suggest that BED is an ancient disease that may owe its host specificity to a highly conserved cluster of 11 T3SS genes.

July 7, 2019

Complete genome sequence of a blaOXA-58-producing Acinetobacter baumannii strain isolated from a Mexican hospital.

In this study, we present the complete genome sequence of a blaOXA-58-producing Acinetobacter baumannii strain, sampled from a Mexican hospital and not related to the international clones. Copyright © 2017 Pérez-Oseguera et al.

July 7, 2019

Whole-genome sequence of the 1,4-dioxane-degrading bacterium Mycobacterium dioxanotrophicus PH-06.

We report here the complete genome sequence of Mycobacterium dioxanotrophicus PH-06, which is capable of using 1,4-dioxane as a sole source of carbon and energy. The reported sequence will enable the elucidation of this novel metabolic pathway and the development of molecular biomarkers to assess bioremediation potential at contaminated sites. Copyright © 2017 He et al.

July 7, 2019

Comparative genomics of maize ear rot pathogens reveals expansion of carbohydrate-active enzymes and secondary metabolism backbone genes in Stenocarpella maydis.

Stenocarpella maydis is a plant pathogenic fungus that causes Diplodia ear rot, one of the most destructive diseases of maize. To date, little information is available regarding the molecular basis of pathogenesis in this organism, in part due to limited genomic resources. In this study, a 54.8 Mb draft genome assembly of S. maydis was obtained with Illumina and PacBio sequencing technologies, and analyzed. Comparative genomic analyses with the predominant maize ear rot pathogens Aspergillus flavus, Fusarium verticillioides, and Fusarium graminearum revealed an expanded set of carbohydrate-active enzymes for cellulose and hemicellulose degradation in S. maydis. Analyses of predicted genes involved in starch degradation revealed six putative a-amylases, four extracellular and two intracellular, and two putative ?-amylases, one of which appears to have been acquired from bacteria via horizontal transfer. Additionally, 87 backbone genes involved in secondary metabolism were identified, which represents one of the largest known assemblages among Pezizomycotina species. Numerous secondary metabolite gene clusters were identified, including two clusters likely involved in the biosynthesis of diplodiatoxin and chaetoglobosins. The draft genome of S. maydis presented here will serve as a useful resource for molecular genetics, functional genomics, and analyses of population diversity in this organism. Copyright © 2017 British Mycological Society. Published by Elsevier Ltd. All rights reserved.

July 7, 2019

The genome sequence of Bipolaris cookei reveals mechanisms of pathogenesis underlying target leaf spot of sorghum.

Bipolaris cookei (=Bipolaris sorghicola) causes target leaf spot, one of the most prevalent foliar diseases of sorghum. Little is known about the molecular basis of pathogenesis in B. cookei, in large part due to a paucity of resources for molecular genetics, such as a reference genome. Here, a draft genome sequence of B. cookei was obtained and analyzed. A hybrid assembly strategy utilizing Illumina and Pacific Biosciences sequencing technologies produced a draft nuclear genome of 36.1?Mb, organized into 321 scaffolds with L50 of 31 and N50 of 378?kb, from which 11,189 genes were predicted. Additionally, a finished mitochondrial genome sequence of 135,790?bp was obtained, which contained 75 predicted genes. Comparative genomics revealed that B. cookei possessed substantially fewer carbohydrate-active enzymes and secreted proteins than closely related Bipolaris species. Novel genes involved in secondary metabolism, including genes implicated in ophiobolin biosynthesis, were identified. Among 37 B. cookei genes induced during sorghum infection, one encodes a putative effector with a limited taxonomic distribution among plant pathogenic fungi. The draft genome sequence of B. cookei provided novel insights into target leaf spot of sorghum and is an important resource for future investigation.

Asset Tag: Yale Center for Genome Analysis

Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.

Complete sequence of a conjugative IncN plasmid harboring blakpc-2, blashv-12, and qnrS1 from an Escherichia coli sequence type 648 strain

Ferrets exclusively synthesize Neu5Ac and express naturally humanized influenza A virus receptors.

Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus.

Proteomic analysis of Pemphigus autoantibodies indicates a larger, more diverse, and more dynamic repertoire than determined by B cell genetics.

Four complete Paenibacillus larvae genome sequences.

Draft genome sequence of Streptomyces scabrisporus NF3, an endophyte isolated from Amphipterygium adstringens.

Whole-genome restriction mapping by “subhaploid”-based RAD sequencing: An efficient and flexible approach for physical mapping and genome scaffolding.

Draft genome sequence of the fish pathogen Flavobacterium columnare strain CSF-298-10.

Evolutionary dynamics of pathoadaptation revealed by three independent acquisitions of the VirB/D4 type IV secretion system in Bartonella.

Recombination of virulence genes in divergent Acidovorax avenae strains that infect a common host.

Complete genome sequence of a blaOXA-58-producing Acinetobacter baumannii strain isolated from a Mexican hospital.

Whole-genome sequence of the 1,4-dioxane-degrading bacterium Mycobacterium dioxanotrophicus PH-06.

Comparative genomics of maize ear rot pathogens reveals expansion of carbohydrate-active enzymes and secondary metabolism backbone genes in Stenocarpella maydis.

The genome sequence of Bipolaris cookei reveals mechanisms of pathogenesis underlying target leaf spot of sorghum.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert