Broad Institute Archives - Page 5 of 11

September 22, 2019 |

Genome analysis of the yeast M14, an industrial brewing yeast strain widely used in China

The lager brewing yeast M14 is the most widely used yeast strain in the high gravity brewing process in China. To investigate the characteristics of this strain, the genome of the yeast M14 was sequenced and the genome annotation information is presented in this study. The current assembly contained 133 scaffolds and its total size was around 23?Mb with a GC content of 38.98%. The brewing yeast M14 is a hybrid Saccharomyces cerevisiae?×?Saccharomyces uvarum at the genomic level and its genome is comprised of one circular mitochondrial genome originating from S. uvarum. Furthermore, the functions of the 9,796 protein coding genes were annotated and their functions were analyzed using the Swiss-Prot database. Among them, the key genes responsible for typical lager brewing yeast characteristics, such as maltotriose uptake and sulfite production, were annotated and analyzed. Interestingly, nine specific genes present in the brewing yeast M14 were not found in the genome of either S. uvarum CBS 7001 or S. cerevisiae S288C, which are very close to strain M14 in the phylogenetic relationship. These nine genes encoding proteins were melibiase, DNA replication protein, fructose symporter, hypothetical protein, hypothetical protein M773_09155, LIF1, minor spike protein H, ribosomal protein S27, and mitochondrial chaperones, respectively. The genome sequence of the yeast strain M14 provides a new tool to better understand brewing yeast behavior in industrial beer production.

September 22, 2019 |

A complete Cannabis chromosome assembly and adaptive admixture for elevated cannabidiol (CBD) content

Cannabis has been cultivated for millennia with distinct cultivars providing either fiber and grain or tetrahydrocannabinol. Recent demand for cannabidiol rather than tetrahydrocannabinol has favored the breeding of admixed cultivars with extremely high cannabidiol content. Despite several draft Cannabis genomes, the genomic structure of cannabinoid synthase loci has remained elusive. A genetic map derived from a tetrahydrocannabinol/cannabidiol segregating population and a complete chromosome assembly from a high-cannabidiol cultivar together resolve the linkage of cannabidiolic and tetrahydrocannabinolic acid synthase gene clusters which are associated with transposable elements. High-cannabidiol cultivars appear to have been generated by integrating hemp-type cannabidiolic acid synthase gene clusters into a background of marijuana-type cannabis. Quantitative trait locus mapping suggests that overall drug potency, however, is associated with other genomic regions needing additional study.

September 22, 2019 |

SKA: Split Kmer Analysis Toolkit for Bacterial Genomic Epidemiology

Genome sequencing is revolutionising infectious disease epidemiology, providing a huge step forward in sensitivity and specificity over more traditional molecular typing techniques. However, the complexity of genome data often means that its analysis and interpretation requires high-performance compute infrastructure and dedicated bioinformatics support. Furthermore, current methods have limitations that can differ between analyses and are often opaque to the user, and their reliance on multiple external dependencies makes reproducibility difficult. Here I introduce SKA, a toolkit for analysis of genome sequence data from closely-related, small, haploid genomes. SKA uses split kmers to rapidly identify variation between genome sequences, making it possible to analyse hundreds of genomes on a standard home computer. Tests on publicly available simulated and real-life data show that SKA is both faster and more efficient than the gold standard methods used today while retaining similar levels of accuracy for epidemiological purposes. SKA can take raw read data or genome assemblies as input and calculate pairwise distances, create single linkage clusters and align genomes to a reference genome or using a reference-free approach. SKA requires few decisions to be made by the user, which, along with its computational efficiency, allows genome analysis to become accessible to those with only basic bioinformatics training. The limitations of SKA are also far more transparent than for current approaches, and future improvements to mitigate these limitations are possible. Overall, SKA is a powerful addition to the armoury of the genomic epidemiologist. SKA source code is available from Github (https://github.com/simonrharris/SKA).

September 22, 2019 |

A complete Leishmania donovani reference genome identifies novel genetic variations associated with virulence.

Leishmania donovani is responsible for visceral leishmaniasis, a neglected and lethal parasitic disease with limited treatment options and no vaccine. The study of L. donovani has been hindered by the lack of a high-quality reference genome and this can impact experimental outcomes including the identification of virulence genes, drug targets and vaccine development. We therefore generated a complete genome assembly by deep sequencing using a combination of second generation (Illumina) and third generation (PacBio) sequencing technologies. Compared to the current L. donovani assembly, the genome assembly reported within resulted in the closure over 2,000 gaps, the extension of several chromosomes up to telomeric repeats and the re-annotation of close to 15% of protein coding genes and the annotation of hundreds of non-coding RNA genes. It was possible to correctly assemble the highly repetitive A2 and Amastin virulence gene clusters. A comparative sequence analysis using the improved reference genome confirmed 70 published and identified 15 novel genomic differences between closely related visceral and atypical cutaneous disease-causing L. donovani strains providing a more complete map of genes associated with virulence and visceral organ tropism. Bioinformatic tools including protein variation effect analyzer and basic local alignment search tool were used to prioritize a list of potential virulence genes based on mutation severity, gene conservation and function. This complete genome assembly and novel information on virulence factors will support the identification of new drug targets and the development of a vaccine for L. donovani.

September 22, 2019 |

Whole-genome sequencing of Chinese yellow catfish provides a valuable genetic resource for high-throughput identification of toxin genes.

Naturally derived toxins from animals are good raw materials for drug development. As a representative venomous teleost, Chinese yellow catfish (Pelteobagrus fulvidraco) can provide valuable resources for studies on toxin genes. Its venom glands are located in the pectoral and dorsal fins. Although with such interesting biologic traits and great value in economy, Chinese yellow catfish is still lacking a sequenced genome. Here, we report a high-quality genome assembly of Chinese yellow catfish using a combination of next-generation Illumina and third-generation PacBio sequencing platforms. The final assembly reached 714 Mb, with a contig N50 of 970 kb and a scaffold N50 of 3.65 Mb, respectively. We also annotated 21,562 protein-coding genes, in which 97.59% were assigned at least one functional annotation. Based on the genome sequence, we analyzed toxin genes in Chinese yellow catfish. Finally, we identified 207 toxin genes and classified them into three major groups. Interestingly, we also expanded a previously reported sex-related region (to ˜6 Mb) in the achieved genome assembly, and localized two important toxin genes within this region. In summary, we assembled a high-quality genome of Chinese yellow catfish and performed high-throughput identification of toxin genes from a genomic view. Therefore, the limited number of toxin sequences in public databases will be remarkably improved once we integrate multi-omics data from more and more sequenced species.

September 22, 2019 |

Genomic insights into virulence mechanisms of Leishmania donovani: evidence from an atypical strain.

Leishmaniasis is a neglected tropical disease with diverse clinical phenotypes, determined by parasite, host and vector interactions. Despite the advances in molecular biology and the availability of more Leishmania genome references in recent years, the association between parasite species and distinct clinical phenotypes remains poorly understood. We present a genomic comparison of an atypical variant of Leishmania donovani from a South Asian focus, where it mostly causes cutaneous form of leishmaniasis.Clinical isolates from six cutaneous leishmaniasis patients (CL-SL); 2 of whom were poor responders to antimony (CL-PR), and two visceral leishmaniasis patients (VL-SL) were sequenced on an Illumina MiSeq platform. Chromosome aneuploidy was observed in both groups but was more frequent in CL-SL. 248 genes differed by 2 fold or more in copy number among the two groups. Genes involved in amino acid use (LdBPK_271940) and energy metabolism (LdBPK_271950), predominated the VL-SL group with the same distribution pattern reflected in gene tandem arrays. Genes encoding amastins were present in higher copy numbers in VL-SL and CL-PR as well as being among predicted pseudogenes in CL-SL. Both chromosome and SNP profiles showed CL-SL and VL-SL to form two distinct groups. While expected heterozygosity was much higher in VL-SL, SNP allele frequency patterns did not suggest potential recent recombination breakpoints. The SNP/indel profile obtained using the more recently generated PacBio sequence did not vary markedly from that based on the standard LdBPK282A1 reference. Several genes previously associated with resistance to antimonials were observed in higher copy numbers in the analysis of CL-PR. H-locus amplification was seen in one cutaneous isolate which however did not belong to the CL-PR group.The data presented suggests that intra species variations at chromosome and gene level are more likely to influence differences in tropism as well as response to treatment, and contributes to greater understanding of parasite molecular mechanisms underpinning these differences. These findings should be substantiated with a larger sample number and expression/functional studies.

September 22, 2019 |

Discovery of the actinoplanic acid pathway in Streptomyces rapamycinicus reveals a genetically conserved synergism with rapamycin.

Actinobacteria possess a great wealth of pathways for production of bioactive compounds. Following advances in genome mining, dozens of natural product (NP) gene clusters are routinely found in each actinobacterial genome; however, the modus operandi of this large arsenal is poorly understood. During investigations of the secondary metabolome of Streptomyces rapamycinicus, the producer of rapamycin, we observed accumulation of two compounds never before reported from this organism. Structural elucidation revealed actinoplanic acid A and its demethyl analogue. Actinoplanic acids (APLs) are potent inhibitors of Ras farnesyltransferase and therefore represent bioactive compounds of medicinal interest. Supported with the unique structure of these polyketides and using genome mining, we identified a gene cluster responsible for their biosynthesis in S. rapamycinicus Based on experimental evidence and genetic organization of the cluster, we propose a stepwise biosynthesis of APL, the first bacterial example of a pathway incorporating the rare tricarballylic moiety into an NP. Although phylogenetically distant, the pathway shares some of the biosynthetic principles with the mycotoxins fumonisins. Namely, the core polyketide is acylated with the tricarballylate by an atypical nonribosomal peptide synthetase-catalyzed ester formation. Finally, motivated by the conserved colocalization of the rapamycin and APL pathway clusters in S. rapamycinicus and all other rapamycin-producing actinobacteria, we confirmed a strong synergism of these compounds in antifungal assays. Mining for such evolutionarily conserved coharboring of pathways would likely reveal further examples of NP sets, attacking multiple targets on the same foe. These could then serve as a guide for development of new combination therapies.© 2018 Mrak et al.

September 22, 2019 |

Hypervirulent group A Streptococcus emergence in an acaspular background is associated with marked remodeling of the bacterial cell surface

Inactivating mutations in the control of virulence two-component regulatory system (covRS) often account for the hypervirulent phenotype in severe, invasive group A streptococcal (GAS) infections. As CovR represses production of the anti-phagocytic hyaluronic acid capsule, high level capsule production is generally considered critical to the hypervirulent phenotype induced by CovRS inactivation. There have recently been large outbreaks of GAS strains lacking capsule, but there are currently no data on the virulence of covRS-mutated, acapsular strains in vivo. We investigated the impact of CovRS inactivation in acapsular serotype M4 strains using a wild-type (M4-SC-1) and a naturally-occurring CovS-inactivated strain (M4-LC-1) that contains an 11bp covS insertion. M4-LC-1 was significantly more virulent in a mouse bacteremia model but caused smaller lesions in a subcutaneous mouse model. Over 10% of the genome showed significantly different transcript levels in M4-LC-1 vs. M4-SC-1 strain. Notably, the Mga regulon and multiple cell surface protein-encoding genes were strongly upregulated–a finding not observed for CovS-inactivated, encapsulated M1 or M3 GAS strains. Consistent with the transcriptomic data, transmission electron microscopy revealed markedly altered cell surface morphology of M4-LC-1 compared to M4-SC-1. Insertional inactivation of covS in M4-SC-1 recapitulated the transcriptome and cell surface morphology. Analysis of the cell surface following CovS-inactivation revealed that the upregulated proteins were part of the Mga regulon. Inactivation of mga in M4-LC-1 reduced transcript levels of multiple cell surface proteins and reversed the cell surface alterations consistent with the effect of CovS inactivation on cell surface composition being mediated by Mga. CovRS-inactivating mutations were detected in 20% of current invasive serotype M4 strains in the United States. Thus, we discovered that hypervirulent M4 GAS strains with covRS mutations can arise in an acapsular background and that such hypervirulence is associated with profound alteration of the cell surface.

September 21, 2019 |

Characterization of multi-drug resistant Enterococcus faecalis isolated from cephalic recording chambers in research macaques (Macaca spp.).

Nonhuman primates are commonly used for cognitive neuroscience research and often surgically implanted with cephalic recording chambers for electrophysiological recording. Aerobic bacterial cultures from 25 macaques identified 72 bacterial isolates, including 15 Enterococcus faecalis isolates. The E. faecalis isolates displayed multi-drug resistant phenotypes, with resistance to ciprofloxacin, enrofloxacin, trimethoprim-sulfamethoxazole, tetracycline, chloramphenicol, bacitracin, and erythromycin, as well as high-level aminoglycoside resistance. Multi-locus sequence typing showed that most belonged to two E. faecalis sequence types (ST): ST 4 and ST 55. The genomes of three representative isolates were sequenced to identify genes encoding antimicrobial resistances and other traits. Antimicrobial resistance genes identified included aac(6′)-aph(2″), aph(3′)-III, str, ant(6)-Ia, tetM, tetS, tetL, ermB, bcrABR, cat, and dfrG, and polymorphisms in parC (S80I) and gyrA (S83I) were observed. These isolates also harbored virulence factors including the cytolysin toxin genes in ST 4 isolates, as well as multiple biofilm-associated genes (esp, agg, ace, SrtA, gelE, ebpABC), hyaluronidases (hylA, hylB), and other survival genes (ElrA, tpx). Crystal violet biofilm assays confirmed that ST 4 isolates produced more biofilm than ST 55 isolates. The abundance of antimicrobial resistance and virulence factor genes in the ST 4 isolates likely relates to the loss of CRISPR-cas. This macaque colony represents a unique model for studying E. faecalis infection associated with indwelling devices, and provides an opportunity to understand the basis of persistence of this pathogen in a healthcare setting.

September 21, 2019 |

Divergent selection causes whole genome differentiation without physical linkage among the targets in Spodoptera frugiperda (Noctuidae)

The process of speciation involves whole genome differentiation by overcoming gene flow between diverging populations. We have ample knowledge which evolutionary forces may cause genomic differentiation, and several speciation models have been proposed to explain the transition from genetic to genomic differentiation. However, it is still unclear what are critical conditions enabling genomic differentiation in nature. The Fall armyworm, Spodoptera frugiperda, is observed as two sympatric strains that have different host-plant ranges, suggesting the possibility of ecological divergent selection. In our previous study, we observed that these two strains show genetic differentiation across the whole genome with an unprecedentedly low extent, suggesting the possibility that whole genome sequences started to be differentiated between the strains. In this study, we analyzed whole genome sequences from these two strains from Mississippi to identify critical evolutionary factors for genomic differentiation. The genomic Fst is low (0.017) while 91.3% of 10kb windows have Fst greater than 0, suggesting genome-wide differentiation with a low extent. We identified nearly 400 outliers of genetic differentiation between strains, and found that physical linkage among these outliers is not a primary cause of genomic differentiation. Fst is not significantly correlated with gene density, a proxy for the strength of selection, suggesting that a genomic reduction in migration rate dominates the extent of local genetic differentiation. Our analyses reveal that divergent selection alone is sufficient to generate genomic differentiation, and any following diversifying factors may increase the level of genetic differentiation between diverging strains in the process of speciation.

September 21, 2019 |

Discovery and genotyping of structural variation from long-read haploid genome sequence data.

In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ~16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ~59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.© 2017 Huddleston et al.; Published by Cold Spring Harbor Laboratory Press.

July 19, 2019 |

Characterizing and measuring bias in sequence data.

DNA sequencing technologies deviate from the ideal uniform distribution of reads. These biases impair scientific and medical applications. Accordingly, we have developed computational methods for discovering, describing and measuring bias.We applied these methods to the Illumina, Ion Torrent, Pacific Biosciences and Complete Genomics sequencing platforms, using data from human and from a set of microbes with diverse base compositions. As in previous work, library construction conditions significantly influence sequencing bias. Pacific Biosciences coverage levels are the least biased, followed by Illumina, although all technologies exhibit error-rate biases in high- and low-GC regions and at long homopolymer runs. The GC-rich regions prone to low coverage include a number of human promoters, so we therefore catalog 1,000 that were exceptionally resistant to sequencing. Our results indicate that combining data from two technologies can reduce coverage bias if the biases in the component technologies are complementary and of similar magnitude. Analysis of Illumina data representing 120-fold coverage of a well-studied human sample reveals that 0.20% of the autosomal genome was covered at less than 10% of the genome-wide average. Excluding locations that were similar to known bias motifs or likely due to sample-reference variations left only 0.045% of the autosomal genome with unexplained poor coverage.The assays presented in this paper provide a comprehensive view of sequencing bias, which can be used to drive laboratory improvements and to monitor production processes. Development guided by these assays should result in improved genome assemblies and better coverage of biologically important loci.

July 19, 2019 |

Pacific Biosciences sequencing technology for genotyping and variation discovery in human data.

Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome some limitations of current next generation sequencing platforms by providing significantly longer reads, single molecule sequencing, low composition bias and an error profile that is orthogonal to other platforms. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical amplicon resequencing projects.We evaluated the Pacific Biosciences technology for SNP discovery in medical resequencing projects using the Genome Analysis Toolkit, observing high sensitivity and specificity for calling differences in amplicons containing known true or false SNPs. We assessed data quality: most errors were indels (~14%) with few apparent miscalls (~1%). In this work, we define a custom data processing pipeline for Pacific Biosciences data for human data analysis.Critically, the error properties were largely free of the context-specific effects that affect other sequencing technologies. These data show excellent utility for follow-up validation and extension studies in human data and medical genetics projects, but can be extended to other organisms with a reference genome.

July 19, 2019 |

The somatic genomic landscape of chromophobe renal cell carcinoma.

We describe the landscape of somatic genomic alterations of 66 chromophobe renal cell carcinomas (ChRCCs) on the basis of multidimensional and comprehensive characterization, including mtDNA and whole-genome sequencing. The result is consistent that ChRCC originates from the distal nephron compared with other kidney cancers with more proximal origins. Combined mtDNA and gene expression analysis implicates changes in mitochondrial function as a component of the disease biology, while suggesting alternative roles for mtDNA mutations in cancers relying on oxidative phosphorylation. Genomic rearrangements lead to recurrent structural breakpoints within TERT promoter region, which correlates with highly elevated TERT expression and manifestation of kataegis, representing a mechanism of TERT upregulation in cancer distinct from previously observed amplifications and point mutations. Copyright © 2014 Elsevier Inc. All rights reserved.

July 19, 2019 |

Long-read, whole-genome shotgun sequence data for five model organisms.

Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.

Auto Tag: Broad Institute

Genome analysis of the yeast M14, an industrial brewing yeast strain widely used in China

A complete Cannabis chromosome assembly and adaptive admixture for elevated cannabidiol (CBD) content

SKA: Split Kmer Analysis Toolkit for Bacterial Genomic Epidemiology

A complete Leishmania donovani reference genome identifies novel genetic variations associated with virulence.

Whole-genome sequencing of Chinese yellow catfish provides a valuable genetic resource for high-throughput identification of toxin genes.

Genomic insights into virulence mechanisms of Leishmania donovani: evidence from an atypical strain.

Discovery of the actinoplanic acid pathway in Streptomyces rapamycinicus reveals a genetically conserved synergism with rapamycin.

Hypervirulent group A Streptococcus emergence in an acaspular background is associated with marked remodeling of the bacterial cell surface

Characterization of multi-drug resistant Enterococcus faecalis isolated from cephalic recording chambers in research macaques (Macaca spp.).

Divergent selection causes whole genome differentiation without physical linkage among the targets in Spodoptera frugiperda (Noctuidae)

Discovery and genotyping of structural variation from long-read haploid genome sequence data.

Characterizing and measuring bias in sequence data.

Pacific Biosciences sequencing technology for genotyping and variation discovery in human data.

The somatic genomic landscape of chromophobe renal cell carcinoma.

Long-read, whole-genome shotgun sequence data for five model organisms.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert