De novo assembly of the Streptomyces sp. strain Mg1 genome using PacBio single-molecule sequencing.
We report a draft genome assembly of Streptomyces sp. strain Mg1, a competitive soil isolate with multiple secondary metabolite gene clusters.
We report a draft genome assembly of Streptomyces sp. strain Mg1, a competitive soil isolate with multiple secondary metabolite gene clusters.
The Burkholderia cepacia complex (BCC) is a group of closely related bacteria that are responsible for respiratory infections in immunocompromised humans, most notably those with cystic fibrosis (CF). We report the genome sequences for Burkholderia cenocepacia ET12 lineage CF isolates K56-2 and BC7.
The large genome size of many species hinders the development and application of genomic tools to study them. For instance, loblolly pine (Pinus taeda L.), an ecologically and economically important conifer, has a large and yet uncharacterized genome of 21.7 Gbp. To characterize the pine genome, we performed exome capture and sequencing of 14 729 genes derived from an assembly of expressed sequence tags. Efficiency of sequence capture was evaluated and shown to be similar across samples with increasing levels of complexity, including haploid cDNA, haploid genomic DNA and diploid genomic DNA. However, this efficiency was severely reduced for probes that overlapped multiple exons, presumably because intron sequences hindered probe:exon hybridizations. Such regions could not be entirely avoided during probe design, because of the lack of a reference sequence. To improve the throughput and reduce the cost of sequence capture, a method to multiplex the analysis of up to eight samples was developed. Sequence data showed that multiplexed capture was reproducible among 24 haploid samples, and can be applied for high-throughput analysis of targeted genes in large populations. Captured sequences were de novo assembled, resulting in 11 396 expanded and annotated gene models, significantly improving the knowledge about the pine gene space. Interspecific capture was also evaluated with over 98% of all probes designed from P. taeda that were efficient in sequence capture, were also suitable for analysis of the related species Pinus elliottii Engelm.© 2013 The Authors The Plant Journal © 2013 John Wiley & Sons Ltd.
Salmonella bongori is a close relative of the highly virulent members of S. enterica subspecies enterica, encompassing more than 2,500 serovars, most of which cause human salmonellosis, one of the leading food-borne illnesses. S. bongori is only very rarely implicated in infections. We here present the sequence of a clinical isolate from Switzerland, S. bongori strain N268-08.
The HeLa cell line was established in 1951 from cervical cancer cells taken from a patient, Henrietta Lacks. This was the first successful attempt to immortalize human-derived cells in vitro. The robust growth and unrestricted distribution of HeLa cells resulted in its broad adoption–both intentionally and through widespread cross-contamination–and for the past 60?years it has served a role analogous to that of a model organism. The cumulative impact of the HeLa cell line on research is demonstrated by its occurrence in more than 74,000 PubMed abstracts (approximately 0.3%). The genomic architecture of HeLa remains largely unexplored beyond its karyotype, partly because like many cancers, its extensive aneuploidy renders such analyses challenging. We carried out haplotype-resolved whole-genome sequencing of the HeLa CCL-2 strain, examined point- and indel-mutation variations, mapped copy-number variations and loss of heterozygosity regions, and phased variants across full chromosome arms. We also investigated variation and copy-number profiles for HeLa S3 and eight additional strains. We find that HeLa is relatively stable in terms of point variation, with few new mutations accumulating after early passaging. Haplotype resolution facilitated reconstruction of an amplified, highly rearranged region of chromosome 8q24.21 at which integration of the human papilloma virus type 18 (HPV-18) genome occurred and that is likely to be the event that initiated tumorigenesis. We combined these maps with RNA-seq and ENCODE Project data sets to phase the HeLa epigenome. This revealed strong, haplotype-specific activation of the proto-oncogene MYC by the integrated HPV-18 genome approximately 500?kilobases upstream, and enabled global analyses of the relationship between gene dosage and expression. These data provide an extensively phased, high-quality reference genome for past and future experiments relying on HeLa, and demonstrate the value of haplotype resolution for characterizing cancer genomes and epigenomes.
Ensifer (syn. Sinorhizobium) meliloti is an important symbiotic bacterial species that fixes nitrogen. Strains BO21CC and AK58 were previously investigated for their substrate utilization and their plant-growth promoting abilities showing interesting features. Here, we describe the complete genome sequence and annotation of these strains. BO21CC and AK58 genomes are 6,985,065 and 6,974,333 bp long with 6,746 and 6,992 genes predicted, respectively.
Salinicoccus carnicancri Jung et al. 2010 belongs to the genus Salinicoccus in the family Staphylococcaceae. Members of the Salinicoccus are moderately halophilic and originate from various salty environments. The halophilic features of the Salinicoccus suggest their possible uses in biotechnological applications, such as biodegradation and fermented food production. However, the genus Salinicoccus is poorly characterized at the genome level, despite its potential importance. This study presents the draft genome sequence of S. carnicancri strain Crm(T) and its annotation. The 2,673,309 base pair genome contained 2,700 protein-coding genes and 78 RNA genes with an average G+C content of 47.93 mol%. It was notable that the strain carried 72 predicted genes associated with osmoregulation, which suggests the presence of beneficial functions that facilitate growth in high-salt environments.
Rhizobium leguminosarum bv. trifolii SRDI565 (syn. N8-J) is an aerobic, motile, Gram-negative, non-spore-forming rod. SRDI565 was isolated from a nodule recovered from the roots of the annual clover Trifolium subterraneum subsp. subterraneum grown in the greenhouse and inoculated with soil collected from New South Wales, Australia. SRDI565 has a broad host range for nodulation within the clover genus, however N2-fixation is sub-optimal with some Trifolium species and ineffective with others. Here we describe the features of R. leguminosarum bv. trifolii strain SRDI565, together with genome sequence information and annotation. The 6,905,599 bp high-quality-draft genome is arranged into 7 scaffolds of 7 contigs, contains 6,750 protein-coding genes and 86 RNA-only encoding genes, and is one of 100 rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.
Copy number variation (CNV) contributes to disease and has restructured the genomes of great apes. The diversity and rate of this process, however, have not been extensively explored among great ape lineages. We analyzed 97 deeply sequenced great ape and human genomes and estimate 16% (469 Mb) of the hominid genome has been affected by recent CNV. We identify a comprehensive set of fixed gene deletions (n = 340) and duplications (n = 405) as well as >13.5 Mb of sequence that has been specifically lost on the human lineage. We compared the diversity and rates of copy number and single nucleotide variation across the hominid phylogeny. We find that CNV diversity partially correlates with single nucleotide diversity (r(2) = 0.5) and recapitulates the phylogeny of apes with few exceptions. Duplications significantly outpace deletions (2.8-fold). The load of segregating duplications remains significantly higher in bonobos, Western chimpanzees, and Sumatran orangutans-populations that have experienced recent genetic bottlenecks (P = 0.0014, 0.02, and 0.0088, respectively). The rate of fixed deletion has been more clocklike with the exception of the chimpanzee lineage, where we observe a twofold increase in the chimpanzee-bonobo ancestor (P = 4.79 × 10(-9)) and increased deletion load among Western chimpanzees (P = 0.002). The latter includes the first genomic disorder in a chimpanzee with features resembling Smith-Magenis syndrome mediated by a chimpanzee-specific increase in segmental duplication complexity. We hypothesize that demographic effects, such as bottlenecks, have contributed to larger and more gene-rich segments being deleted in the chimpanzee lineage and that this effect, more generally, may account for episodic bursts in CNV during hominid evolution.
The complete genome sequence of the original isolate of the model actinomycete Streptomyces lividans 66, also referred to as 1326, was deciphered after a combination of next-generation sequencing platforms and a hybrid assembly pipeline. Comparative analysis of the genomes of S. lividans 66 and closely related strains, including S. coelicolor M145 and S. lividans TK24, was used to identify strain-specific genes. The genetic diversity identified included a large genomic island with a mosaic structure, present in S. lividans 66 but not in the strain TK24. Sequence analyses showed that this genomic island has an anomalous (G + C) content, suggesting recent acquisition and that it is rich in metal-related genes. Sequences previously linked to a mobile conjugative element, termed plasmid SLP3 and defined here as a 94 kb region, could also be identified within this locus. Transcriptional analysis of the response of S. lividans 66 to copper was used to corroborate a role of this large genomic island, including two SLP3-borne “cryptic” peptide biosynthetic gene clusters, in metal homeostasis. Notably, one of these predicted biosynthetic systems includes an unprecedented nonribosomal peptide synthetase–tRNA-dependent transferase biosynthetic hybrid organization. This observation implies the recruitment of members of the leucyl/phenylalanyl-tRNA-protein transferase family to catalyze peptide bond formation within the biosynthesis of natural products. Thus, the genome sequence of S. lividans 66 not only explains long-standing genetic and phenotypic differences but also opens the door for further in-depth comparative genomic analyses of model Streptomyces strains, as well as for the discovery of novel natural products following genome-mining approaches.
The field of nonhuman primate genomics is undergoing rapid change and making impressive progress. Exploiting new technologies for DNA sequencing, researchers have generated new whole-genome sequence assemblies for multiple primate species over the past 6 years. In addition, investigations of within-species genetic variation, gene expression and RNA sequences, conservation of non-protein-coding regions of the genome, and other aspects of comparative genomics are moving at an accelerating speed. This progress is opening a wide array of new research opportunities in the analysis of comparative primate genome content and evolution. It also creates new possibilities for the use of nonhuman primates as model organisms in biomedical research. This transition, based on both new technology and the new information being generated in regard to human genetics, provides an important justification for reevaluating the research goals, strategies, and study designs used in primate genetics and genomics.
Leisingera aquimarina Vandecandelaere et al. 2008 is a member of the genomically well characterized Roseobacter clade within the family Rhodobacteraceae. Representatives of the marine Roseobacter clade are metabolically versatile and involved in carbon fixation and biogeochemical processes. They form a physiologically heterogeneous group, found predominantly in coastal or polar waters, especially in symbiosis with algae, in microbial mats, in sediments or associated with invertebrates. Here we describe the features of L. aquimarina DSM 24565(T) together with the permanent-draft genome sequence and annotation. The 5,344,253 bp long genome consists of one chromosome and an unusually high number of seven extrachromosomal elements and contains 5,129 protein-coding and 89 RNA genes. It was sequenced as part of the DOE Joint Genome Institute Community Sequencing Program 2010 and of the activities of the Transregional Collaborative Research Centre 51 funded by the German Research Foundation (DFG).
Multidrug-resistant New Delhi metallo-ß-lactamase 1 (NDM-1)-producing bacteria have spread globally and become a major clinical and public health threat. We report here the draft genome sequence of the Klebsiella pneumoniae clinical isolate 303K, harboring an NDM-1 coding sequence.
The successes of targeted drugs with companion predictive biomarkers and the technological advances in gene sequencing have generated enthusiasm for evaluating personalized cancer medicine strategies using genomic profiling. We assessed the feasibility of incorporating real-time analysis of somatic mutations within exons of 19 genes into patient management. Blood, tumor biopsy and archived tumor samples were collected from 50 patients recruited from four cancer centers. Samples were analyzed using three technologies: targeted exon sequencing using Pacific Biosciences PacBio RS, multiplex somatic mutation genotyping using Sequenom MassARRAY and Sanger sequencing. An expert panel reviewed results prior to reporting to clinicians. A clinical laboratory verified actionable mutations. Fifty patients were recruited. Nineteen actionable mutations were identified in 16 (32%) patients. Across technologies, results were in agreement in 100% of biopsy specimens and 95% of archival specimens. Profiling results from paired archival/biopsy specimens were concordant in 30/34 (88%) patients. We demonstrated that the use of next generation sequencing for real-time genomic profiling in advanced cancer patients is feasible. Additionally, actionable mutations identified in this study were relatively stable between archival and biopsy samples, implying that cancer mutations that are good predictors of drug response may remain constant across clinical stages. Copyright © 2012 UICC.
Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (i) do not consider individuals sharing haplotypes jointly, which reduces the size and accuracy of assembled haplotypes, and (ii) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy). Polyploid organisms are increasingly becoming the target of many research groups interested in the genomics of disease, phylogenetics, botany and evolution but there is an absence of theory and methods for polyploid haplotype reconstruction.In this work, we present a number of results, extensions and generalizations of compass graphs and our HapCompass framework. We prove the theoretical complexity of two haplotype assembly optimizations, thereby motivating the use of heuristics. Furthermore, we present graph theory-based algorithms for the problem of haplotype assembly using our previously developed HapCompass framework for (i) novel implementations of haplotype assembly optimizations (minimum error correction), (ii) assembly of a pair of individuals sharing a haplotype tract identical by descent and (iii) assembly of polyploid genomes. We evaluate our methods on 1000 Genomes Project, Pacific Biosciences and simulated sequence data.HapCompass is available for download at http://www.brown.edu/Research/Istrail_Lab/.Supplementary data are available at Bioinformatics online.
If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.