Menu
July 7, 2019

The genome sequence of Streptomyces lividans 66 reveals a novel tRNA-dependent peptide biosynthetic system within a metal-related genomic island.

The complete genome sequence of the original isolate of the model actinomycete Streptomyces lividans 66, also referred to as 1326, was deciphered after a combination of next-generation sequencing platforms and a hybrid assembly pipeline. Comparative analysis of the genomes of S. lividans 66 and closely related strains, including S. coelicolor M145 and S. lividans TK24, was used to identify strain-specific genes. The genetic diversity identified included a large genomic island with a mosaic structure, present in S. lividans 66 but not in the strain TK24. Sequence analyses showed that this genomic island has an anomalous (G + C) content, suggesting recent acquisition and that it is rich in metal-related genes. Sequences previously linked to a mobile conjugative element, termed plasmid SLP3 and defined here as a 94 kb region, could also be identified within this locus. Transcriptional analysis of the response of S. lividans 66 to copper was used to corroborate a role of this large genomic island, including two SLP3-borne “cryptic” peptide biosynthetic gene clusters, in metal homeostasis. Notably, one of these predicted biosynthetic systems includes an unprecedented nonribosomal peptide synthetase–tRNA-dependent transferase biosynthetic hybrid organization. This observation implies the recruitment of members of the leucyl/phenylalanyl-tRNA-protein transferase family to catalyze peptide bond formation within the biosynthesis of natural products. Thus, the genome sequence of S. lividans 66 not only explains long-standing genetic and phenotypic differences but also opens the door for further in-depth comparative genomic analyses of model Streptomyces strains, as well as for the discovery of novel natural products following genome-mining approaches.


July 7, 2019

PBSIM: PacBio reads simulator–toward accurate genome assembly.

PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries.Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results.PBSIM is freely available from the web under the GNU GPL v2 license (http://code.google.com/p/pbsim/).


July 7, 2019

The genome of the anaerobic fungus Orpinomyces sp. strain C1A reveals the unique evolutionary history of a remarkable plant biomass degrader.

Anaerobic gut fungi represent a distinct early-branching fungal phylum (Neocallimastigomycota) and reside in the rumen, hindgut, and feces of ruminant and nonruminant herbivores. The genome of an anaerobic fungal isolate, Orpinomyces sp. strain C1A, was sequenced using a combination of Illumina and PacBio single-molecule real-time (SMRT) technologies. The large genome (100.95 Mb, 16,347 genes) displayed extremely low G+C content (17.0%), large noncoding intergenic regions (73.1%), proliferation of microsatellite repeats (4.9%), and multiple gene duplications. Comparative genomic analysis identified multiple genes and pathways that are absent in Dikarya genomes but present in early-branching fungal lineages and/or nonfungal Opisthokonta. These included genes for posttranslational fucosylation, the production of specific intramembrane proteases and extracellular protease inhibitors, the formation of a complete axoneme and intraflagellar trafficking machinery, and a near-complete focal adhesion machinery. Analysis of the lignocellulolytic machinery in the C1A genome revealed an extremely rich repertoire, with evidence of horizontal gene acquisition from multiple bacterial lineages. Experimental analysis indicated that strain C1A is a remarkable biomass degrader, capable of simultaneous saccharification and fermentation of the cellulosic and hemicellulosic fractions in multiple untreated grasses and crop residues examined, with the process significantly enhanced by mild pretreatments. This capability, acquired during its separate evolutionary trajectory in the rumen, along with its resilience and invasiveness compared to prokaryotic anaerobes, renders anaerobic fungi promising agents for consolidated bioprocessing schemes in biofuels production.


July 7, 2019

Cerulean: A hybrid assembly using high throughput short and long reads

Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats.


July 7, 2019

A hybrid approach for the automated finishing of bacterial genomes.

Advances in DNA sequencing technology have improved our ability to characterize most genomic diversity. However, accurate resolution of large structural events is challenging because of the short read lengths of second-generation technologies. Third-generation sequencing technologies, which can yield longer multikilobase reads, have the potential to address limitations associated with genome assembly. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at >99.9% accuracy. Complex regions with clinically relevant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 cholera reference strain, we obtained 14 scaffolds of greater than 1 kb for the experimental data and 8 scaffolds of greater than 1 kb for the simulated data, which allowed us to correct several errors in contigs assembled from the short-read data alone. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly.


July 7, 2019

Next generation sequencing technologies and the changing landscape of phage genomics.

The dawn of next generation sequencing technologies has opened up exciting possibilities for whole genome sequencing of a plethora of organisms. The 2nd and 3rd generation sequencing technologies, based on cloning-free, massively parallel sequencing, have enabled the generation of a deluge of genomic sequences of both prokaryotic and eukaryotic origin in the last seven years. However, whole genome sequencing of bacterial viruses has not kept pace with this revolution, despite the fact that their genomes are orders of magnitude smaller in size compared with bacteria and other organisms. Sequencing phage genomes poses several challenges; (1) obtaining pure phage genomic material, (2) PCR amplification biases and (3) complex nature of their genetic material due to features such as methylated bases and repeats that are inherently difficult to sequence and assemble. Here we describe conclusions drawn from our efforts in sequencing hundreds of bacteriophage genomes from a variety of Gram-positive and Gram-negative bacteria using Sanger, 454, Illumina and PacBio technologies. Based on our experience we propose several general considerations regarding sample quality, the choice of technology and a “blended approach” for generating reliable whole genome sequences of phages.


July 7, 2019

Next-generation sequencing and large genome assemblies.

The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed.


July 7, 2019

Medulloblastoma exome sequencing uncovers subtype-specific somatic mutations.

Medulloblastomas are the most common malignant brain tumours in children. Identifying and understanding the genetic events that drive these tumours is critical for the development of more effective diagnostic, prognostic and therapeutic strategies. Recently, our group and others described distinct molecular subtypes of medulloblastoma on the basis of transcriptional and copy number profiles. Here we use whole-exome hybrid capture and deep sequencing to identify somatic mutations across the coding regions of 92 primary medulloblastoma/normal pairs. Overall, medulloblastomas have low mutation rates consistent with other paediatric tumours, with a median of 0.35 non-silent mutations per megabase. We identified twelve genes mutated at statistically significant frequencies, including previously known mutated genes in medulloblastoma such as CTNNB1, PTCH1, MLL2, SMARCA4 and TP53. Recurrent somatic mutations were newly identified in an RNA helicase gene, DDX3X, often concurrent with CTNNB1 mutations, and in the nuclear co-repressor (N-CoR) complex genes GPS2, BCOR and LDB1. We show that mutant DDX3X potentiates transactivation of a TCF promoter and enhances cell viability in combination with mutant, but not wild-type, ß-catenin. Together, our study reveals the alteration of WNT, hedgehog, histone methyltransferase and now N-CoR pathways across medulloblastomas and within specific subtypes of this disease, and nominates the RNA helicase DDX3X as a component of pathogenic ß-catenin signalling in medulloblastoma.


July 7, 2019

An Inv(16)(p13.3q24.3)-encoded CBFA2T3-GLIS2 fusion protein defines an aggressive subtype of pediatric acute megakaryoblastic leukemia.

To define the mutation spectrum in non-Down syndrome acute megakaryoblastic leukemia (non-DS-AMKL), we performed transcriptome sequencing on diagnostic blasts from 14 pediatric patients and validated our findings in a recurrency/validation cohort consisting of 34 pediatric and 28 adult AMKL samples. Our analysis identified a cryptic chromosome 16 inversion (inv(16)(p13.3q24.3)) in 27% of pediatric cases, which encodes a CBFA2T3-GLIS2 fusion protein. Expression of CBFA2T3-GLIS2 in Drosophila and murine hematopoietic cells induced bone morphogenic protein (BMP) signaling and resulted in a marked increase in the self-renewal capacity of hematopoietic progenitors. These data suggest that expression of CBFA2T3-GLIS2 directly contributes to leukemogenesis. Copyright © 2012 Elsevier Inc. All rights reserved.


July 7, 2019

Draft genome assembly and annotation of Glycyrrhiza uralensis, a medicinal legume.

Chinese liquorice/licorice (Glycyrrhiza uralensis) is a leguminous plant species whose roots and rhizomes have been widely used as a herbal medicine and natural sweetener. Whole-genome sequencing is essential for gene discovery studies and molecular breeding in liquorice. Here, we report a draft assembly of the approximately 379-Mb whole-genome sequence of strain 308-19 of G. uralensis; this assembly contains 34 445 predicted protein-coding genes. Comparative analyses suggested well-conserved genomic components and collinearity of gene loci (synteny) between the genome of liquorice and those of other legumes such as Medicago and chickpea. We observed that three genes involved in isoflavonoid biosynthesis, namely, 2-hydroxyisoflavanone synthase (CYP93C), 2,7,4′-trihydroxyisoflavanone 4′-O-methyltransferase/isoflavone 4′-O-methyltransferase (HI4OMT) and isoflavone-7-O-methyltransferase (7-IOMT) formed a cluster on the scaffold of the liquorice genome and showed conserved microsynteny with Medicago and chickpea. Based on the liquorice genome annotation, we predicted genes in the P450 and UDP-dependent glycosyltransferase (UGT) superfamilies, some of which are involved in triterpenoid saponin biosynthesis, and characterised their gene expression with the reference genome sequence. The genome sequencing and its annotations provide an essential resource for liquorice improvement through molecular breeding and the discovery of useful genes for engineering bioactive components through synthetic biology approaches.© 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.


July 7, 2019

LoRTE: Detecting transposon-induced genomic variants using low coverage PacBio long read sequences.

Population genomic analysis of transposable elements has greatly benefited from recent advances of sequencing technologies. However, the short size of the reads and the propensity of transposable elements to nest in highly repeated regions of genomes limits the efficiency of bioinformatic tools when Illumina or 454 technologies are used. Fortunately, long read sequencing technologies generating read length that may span the entire length of full transposons are now available. However, existing TE population genomic softwares were not designed to handle long reads and the development of new dedicated tools is needed.LoRTE is the first tool able to use PacBio long read sequences to identify transposon deletions and insertions between a reference genome and genomes of different strains or populations. Tested against simulated and genuine Drosophila melanogaster PacBio datasets, LoRTE appears to be a reliable and broadly applicable tool to study the dynamic and evolutionary impact of transposable elements using low coverage, long read sequences.LoRTE is an efficient and accurate tool to identify structural genomic variants caused by TE insertion or deletion. LoRTE is available for download at http://www.egce.cnrs-gif.fr/?p=6422.


July 7, 2019

The comparative landscape of duplications in Heliconius melpomene and Heliconius cydno.

Gene duplications can facilitate adaptation and may lead to interpopulation divergence, causing reproductive isolation. We used whole-genome resequencing data from 34 butterflies to detect duplications in two Heliconius species, Heliconius cydno and Heliconius melpomene. Taking advantage of three distinctive signals of duplication in short-read sequencing data, we identified 744 duplicated loci in H. cydno and H. melpomene and evaluated the accuracy of our approach using single-molecule sequencing. We have found that duplications overlap genes significantly less than expected at random in H. melpomene, consistent with the action of background selection against duplicates in functional regions of the genome. Duplicate loci that are highly differentiated between H. melpomene and H. cydno map to four different chromosomes. Four duplications were identified with a strong signal of divergent selection, including an odorant binding protein and another in close proximity with a known wing colour pattern locus that differs between the two species. Heredity advance online publication, 7 December 2016; doi:10.1038/hdy.2016.107.


July 7, 2019

Draft genome sequence of Mentha longifolia (L.) and development of resources for mint cultivar improvement.

The genus Mentha encompasses mint species cultivated for their essential oils, which are formulated into a vast array of consumer products. Desirable oil characteristics and resistance to the fungal disease Verticillium wilt are top priorities for the mint industry. However, cultivated mints have complex polyploid genomes and are sterile. Breeding efforts, therefore, require the development of genomic resources for fertile mint species. Here, we present draft de novo genome and plastome assemblies for a wilt-resistant South African accession of Mentha longifolia (L.) Huds., a diploid species ancestral to cultivated peppermint and spearmint. The 353 Mb genome contains 35 597 predicted protein-coding genes, including 292 disease resistance gene homologs, and nine genes determining essential oil characteristics. A genetic linkage map ordered 1397 genome scaffolds on 12 pseudochromosomes. More than two million simple sequence repeats were identified, which will facilitate molecular marker development. The M. longifolia genome is a valuable resource for both metabolic engineering and molecular breeding. This is exemplified by employing the genome sequence to clone and functionally characterize the promoters in a peppermint cultivar, and demonstrating the utility of a glandular trichome-specific promoter to increase expression of a biosynthetic gene, thereby modulating essential oil composition. Copyright © 2017 The Author. Published by Elsevier Inc. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.