Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.
Rigorous study of mitochondrial functions and cell biology in the budding yeast, Saccharomyces cerevisiae has advanced our understanding of mitochondrial genetics. This yeast is now a powerful model for population genetics, owing to large genetic diversity and highly structured populations among wild isolates. Comparative mitochondrial genomic analyses between yeast species have revealed broad evolutionary changes in genome organization and architecture. A fine-scale view of recent evolutionary changes within S. cerevisiae has not been possible due to low numbers of complete mitochondrial sequences.To address challenges of sequencing AT-rich and repetitive mitochondrial DNAs (mtDNAs), we sequenced two divergent S. cerevisiae mtDNAs using a single-molecule sequencing platform (PacBio RS). Using de novo assemblies, we generated highly accurate complete mtDNA sequences. These mtDNA sequences were compared with 98 additional mtDNA sequences gathered from various published collections. Phylogenies based on mitochondrial coding sequences and intron profiles revealed that intraspecific diversity in mitochondrial genomes generally recapitulated the population structure of nuclear genomes. Analysis of intergenic sequence indicated a recent expansion of mobile elements in certain populations. Additionally, our analyses revealed that certain populations lacked introns previously believed conserved throughout the species, as well as the presence of introns never before reported in S. cerevisiae.Our results revealed that the extensive variation in S. cerevisiae mtDNAs is often population specific, thus offering a window into the recent evolutionary processes shaping these genomes. In addition, we offer an effective strategy for sequencing these challenging AT-rich mitochondrial genomes for small scale projects.
Single-molecule real-time (SMRT) sequencing generates much longer reads than other widely used next-generation (next-gen) sequencing methods, but its application to whole genome/exome analysis has been limited. Here, we describe the use of SMRT sequencing coupled with barcoding to simultaneously analyze one or a small number of genomic targets derived from multiple sources. In the budding yeast system, SMRT sequencing was used to analyze strand-exchange intermediates generated during mitotic recombination and to analyze genetic changes in a forward mutation assay. The general barcoding-SMRT approach was then extended to diffuse large B-cell lymphoma primary tumors and cell lines, where detected changes agreed with prior Illumina exome sequencing. A distinct advantage afforded by SMRT sequencing over other next-gen methods is that it immediately provides the linkage relationships between SNPs in the target segment sequenced. The strength of our approach for mutation/recombination studies (as well as linkage identification) derives from its inherent computational simplicity coupled with a lack of reliance on sophisticated statistical analyses. Copyright © 2015 Guo et al.
Multiple origins of the pathogenic yeast Candida orthopsilosis by separate hybridizations between two parental species.
Mating between different species produces hybrids that are usually asexual and stuck as diploids, but can also lead to the formation of new species. Here, we report the genome sequences of 27 isolates of the pathogenic yeast Candida orthopsilosis. We find that most isolates are diploid hybrids, products of mating between two unknown parental species (A and B) that are 5% divergent in sequence. Isolates vary greatly in the extent of homogenization between A and B, making their genomes a mosaic of highly heterozygous regions interspersed with homozygous regions. Separate phylogenetic analyses of SNPs in the A- and B-derived portions of the genome produces almost identical trees of the isolates with four major clades. However, the presence of two mutually exclusive genotype combinations at the mating type locus, and recombinant mitochondrial genomes diagnostic of inter-clade mating, shows that the species C. orthopsilosis does not have a single evolutionary origin but was created at least four times by separate interspecies hybridizations between parents A and B. Older hybrids have lost more heterozygosity. We also identify two isolates with homozygous genomes derived exclusively from parent A, which are pure non-hybrid strains. The parallel emergence of the same hybrid species from multiple independent hybridization events is common in plant evolution, but is much less documented in pathogenic fungi.
Exploiting members of the BAHD acyltransferase family to synthesize multiple hydroxycinnamate and benzoate conjugates in yeast.
BAHD acyltransferases, named after the first four biochemically characterized enzymes of the group, are plant-specific enzymes that catalyze the transfer of coenzyme A-activated donors onto various acceptor molecules. They are responsible for the synthesis in plants of a myriad of secondary metabolites, some of which are beneficial for humans either as therapeutics or as specialty chemicals such as flavors and fragrances. The production of pharmaceutical, nutraceutical and commodity chemicals using engineered microbes is an alternative, green route to energy-intensive chemical syntheses that consume petroleum-based precursors. However, identification of appropriate enzymes and validation of their functional expression in heterologous hosts is a prerequisite for the design and implementation of metabolic pathways in microbes for the synthesis of such target chemicals.For the synthesis of valuable metabolites in the yeast Saccharomyces cerevisiae, we selected BAHD acyltransferases based on their preferred donor and acceptor substrates. In particular, BAHDs that use hydroxycinnamoyl-CoAs and/or benzoyl-CoA as donors were targeted because a large number of molecules beneficial to humans belong to this family of hydroxycinnamate and benzoate conjugates. The selected BAHD coding sequences were synthesized and cloned individually on a vector containing the Arabidopsis gene At4CL5, which encodes a promiscuous 4-coumarate:CoA ligase active on hydroxycinnamates and benzoates. The various S. cerevisiae strains obtained for co-expression of At4CL5 with the different BAHDs effectively produced a wide array of valuable hydroxycinnamate and benzoate conjugates upon addition of adequate combinations of donors and acceptor molecules. In particular, we report here for the first time the production in yeast of rosmarinic acid and its derivatives, quinate hydroxycinnamate esters such as chlorogenic acid, and glycerol hydroxycinnamate esters. Similarly, we achieved for the first time the microbial production of polyamine hydroxycinnamate amides; monolignol, malate and fatty alcohol hydroxycinnamate esters; tropane alkaloids; and benzoate/caffeate alcohol esters. In some instances, the additional expression of Flavobacterium johnsoniae tyrosine ammonia-lyase (FjTAL) allowed the synthesis of p-coumarate conjugates and eliminated the need to supplement the culture media with 4-hydroxycinnamate.We demonstrate in this study the effectiveness of expressing members of the plant BAHD acyltransferase family in yeast for the synthesis of numerous valuable hydroxycinnamate and benzoate conjugates.
Structural rearrangements have long been recognized as an important source of genetic variation, with implications in phenotypic diversity and disease, yet their detailed evolutionary dynamics remain elusive. Here we use long-read sequencing to generate end-to-end genome assemblies for 12 strains representing major subpopulations of the partially domesticated yeast Saccharomyces cerevisiae and its wild relative Saccharomyces paradoxus. These population-level high-quality genomes with comprehensive annotation enable precise definition of chromosomal boundaries between cores and subtelomeres and a high-resolution view of evolutionary genome dynamics. In chromosomal cores, S. paradoxus shows faster accumulation of balanced rearrangements (inversions, reciprocal translocations and transpositions), whereas S. cerevisiae accumulates unbalanced rearrangements (novel insertions, deletions and duplications) more rapidly. In subtelomeres, both species show extensive interchromosomal reshuffling, with a higher tempo in S. cerevisiae. Such striking contrasts between wild and domesticated yeasts are likely to reflect the influence of human activities on structural genome evolution.
Iterative optimization of xylose catabolism in Saccharomyces cerevisiae using combinatorial expression tuning.
A common challenge in metabolic engineering is rapidly identifying rate-controlling enzymes in heterologous pathways for subsequent production improvement. We demonstrate a workflow to address this challenge and apply it to improving xylose utilization in Saccharomyces cerevisiae. For eight reactions required for conversion of xylose to ethanol, we screened enzymes for functional expression in S. cerevisiae, followed by a combinatorial expression analysis to achieve pathway flux balancing and identification of limiting enzymatic activities. In the next round of strain engineering, we increased the copy number of these limiting enzymes and again tested the eight-enzyme combinatorial expression library in this new background. This workflow yielded a strain that has a ~70% increase in biomass yield and ~240% increase in xylose utilization. Finally, we chromosomally integrated the expression library. This library enriched for strains with multiple integrations of the pathway, which likely were the result of tandem integrations mediated by promoter homology. Biotechnol. Bioeng. 2017;114: 1301-1309. © 2017 Wiley Periodicals, Inc.© 2017 Wiley Periodicals, Inc.
Evolutionary restoration of fertility in an interspecies hybrid yeast, by whole-genome duplication after a failed mating-type switch.
Many interspecies hybrids have been discovered in yeasts, but most of these hybrids are asexual and can replicate only mitotically. Whole-genome duplication has been proposed as a mechanism by which interspecies hybrids can regain fertility, restoring their ability to perform meiosis and sporulate. Here, we show that this process occurred naturally during the evolution of Zygosaccharomyces parabailii, an interspecies hybrid that was formed by mating between 2 parents that differed by 7% in genome sequence and by many interchromosomal rearrangements. Surprisingly, Z. parabailii has a full sexual cycle and is genetically haploid. It goes through mating-type switching and autodiploidization, followed by immediate sporulation. We identified the key evolutionary event that enabled Z. parabailii to regain fertility, which was breakage of 1 of the 2 homeologous copies of the mating-type (MAT) locus in the hybrid, resulting in a chromosomal rearrangement and irreparable damage to 1 MAT locus. This rearrangement was caused by HO endonuclease, which normally functions in mating-type switching. With 1 copy of MAT inactivated, the interspecies hybrid now behaves as a haploid. Our results provide the first demonstration that MAT locus damage is a naturally occurring evolutionary mechanism for whole-genome duplication and restoration of fertility to interspecies hybrids. The events that occurred in Z. parabailii strongly resemble those postulated to have caused ancient whole-genome duplication in an ancestor of Saccharomyces cerevisiae.
Insight into the recent genome duplication of the halophilic yeast Hortaea werneckii: combining an improved genome with gene expression and chromatin structure.
Extremophilic organisms demonstrate the flexibility and adaptability of basic biological processes by highlighting how cell physiology adapts to environmental extremes. Few eukaryotic extremophiles have been well studied and only a small number are amenable to laboratory cultivation and manipulation. A detailed characterization of the genome architecture of such organisms is important to illuminate how they adapt to environmental stresses. One excellent example of a fungal extremophile is the halophile Hortaea werneckii (Pezizomycotina, Dothideomycetes, Capnodiales), a yeast-like fungus able to thrive at near-saturating concentrations of sodium chloride and which is also tolerant to both UV irradiation and desiccation. Given its unique lifestyle and its remarkably recent whole genome duplication, H. werneckii provides opportunities for testing the role of genome duplications and adaptability to extreme environments. We previously assembled the genome of H. werneckii using short-read sequencing technology and found a remarkable degree of gene duplication. Technology limitations, however, precluded high-confidence annotation of the entire genome. We therefore revisited the H. wernickii genome using long-read, single-molecule sequencing and provide an improved genome assembly which, combined with transcriptome and nucleosome analysis, provides a useful resource for fungal halophile genomics. Remarkably, the ~50 Mb H. wernickii genome contains 15,974 genes of which 95% (7608) are duplicates formed by a recent whole genome duplication (WGD), with an average of 5% protein sequence divergence between them. We found that the WGD is extraordinarily recent, and compared to Saccharomyces cerevisiae, the majority of the genome’s ohnologs have not diverged at the level of gene expression of chromatin structure. Copyright © 2017 Sinha et al.
Population genomics shows no distinction between pathogenic Candida krusei and environmental Pichia kudriavzevii: One species, four names.
We investigated genomic diversity of a yeast species that is both an opportunistic pathogen and an important industrial yeast. Under the name Candida krusei, it is responsible for about 2% of yeast infections caused by Candida species in humans. Bloodstream infections with C. krusei are problematic because most isolates are fluconazole-resistant. Under the names Pichia kudriavzevii, Issatchenkia orientalis and Candida glycerinogenes, the same yeast, including genetically modified strains, is used for industrial-scale production of glycerol and succinate. It is also used to make some fermented foods. Here, we sequenced the type strains of C. krusei (CBS573T) and P. kudriavzevii (CBS5147T), as well as 30 other clinical and environmental isolates. Our results show conclusively that they are the same species, with collinear genomes 99.6% identical in DNA sequence. Phylogenetic analysis of SNPs does not segregate clinical and environmental isolates into separate clades, suggesting that C. krusei infections are frequently acquired from the environment. Reduced resistance of strains to fluconazole correlates with the presence of one gene instead of two at the ABC11-ABC1 tandem locus. Most isolates are diploid, but one-quarter are triploid. Loss of heterozygosity is common, including at the mating-type locus. Our PacBio/Illumina assembly of the 10.8 Mb CBS573T genome is resolved into 5 complete chromosomes, and was annotated using RNAseq support. Each of the 5 centromeres is a 35 kb gene desert containing a large inverted repeat. This species is a member of the genus Pichia and family Pichiaceae (the methylotrophic yeasts clade), and so is only distantly related to other pathogenic Candida species.
Draft genome sequence of cryophilic basidiomycetous yeast Mrakia blollopis SK-4, isolated from an algal mat of Naga-ike Lake in the Skarvsnes ice-free area, East Antarctica.
Mrakia blollopis strain SK-4 was isolated from an algal mat of Naga-ike, a lake in Skarvsnes, East Antarctica. Here, we report the draft genome sequence of M. blollopis SK-4. This is the first report on the genome sequence of any cold-adapted fungal species. Copyright © 2015 Tsuji et al.
Complete genome sequence of Kluyveromyces marxianus NBRC1777, a nonconventional thermotolerant yeast.
We determined the genome sequence of the thermotolerant yeast Kluyveromyces marxianus strain NBRC1777. The genome of strain NBRC1777 is composed of 4,912 open reading frames (ORFs) on 8 chromosomes, with a total size of 10,895,581 bp, including mitochondrial DNA. Copyright © 2015 Inokuma et al.
Draft genome sequence of Sporidiobolus salmonicolor CBS 6832, a red-pigmented basidiomycetous yeast.
We report the genome sequencing and annotation of the basidiomycetous red-pigmented yeast Sporidiobolus salmonicolor strain CBS 6832. The current assembly contains 395 scaffolds, for a total size of about 20.5 Mb and a G+C content of ~61.3%. The genome annotation predicts 5,147 putative protein-coding genes. Copyright © 2015 Coelho et al.
GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments.
Genome assemblies generated with next-generation sequencing (NGS) reads usually contain a number of gaps. Several tools have recently been developed to close the gaps in these assemblies with NGS reads. Although these gap-closing tools efficiently close the gaps, they entail a high rate of misassembly at gap-closing sites.We have found that the assembly error rates caused by these tools are 20-500-fold higher than the rate of errors introduced into contigs by de novo assemblers. We here describe GMcloser, a tool that accurately closes these gaps with a preassembled contig set or a long read set (i.e. error-corrected PacBio reads). GMcloser uses likelihood-based classifiers calculated from the alignment statistics between scaffolds, contigs and paired-end reads to correctly assign contigs or long reads to gap regions of scaffolds, thereby achieving accurate and efficient gap closure. We demonstrate with sequencing data from various organisms that the gap-closing accuracy of GMcloser is 3-100-fold higher than those of other available tools, with similar efficiency.GMcloser and an accompanying tool (GMvalue) for evaluating the assembly and correcting misassemblies except SNPs and short indels in the assembly are available at https://firstname.lastname@example.orgSupplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: email@example.com.
De novo assembly of Dekkera bruxellensis: a multi technology approach using short and long-read sequencing and optical mapping.
It remains a challenge to perform de novo assembly using next-generation sequencing (NGS). Despite the availability of multiple sequencing technologies and tools (e.g., assemblers) it is still difficult to assemble new genomes at chromosome resolution (i.e., one sequence per chromosome). Obtaining high quality draft assemblies is extremely important in the case of yeast genomes to better characterise major events in their evolutionary history. The aim of this work is two-fold: on the one hand we want to show how combining different and somewhat complementary technologies is key to improving assembly quality and correctness, and on the other hand we present a de novo assembly pipeline we believe to be beneficial to core facility bioinformaticians. To demonstrate both the effectiveness of combining technologies and the simplicity of the pipeline, here we present the results obtained using the Dekkera bruxellensis genome.In this work we used short-read Illumina data and long-read PacBio data combined with the extreme long-range information from OpGen optical maps in the task of de novo genome assembly and finishing. Moreover, we developed NouGAT, a semi-automated pipeline for read-preprocessing, de novo assembly and assembly evaluation, which was instrumental for this work.We obtained a high quality draft assembly of a yeast genome, resolved on a chromosomal level. Furthermore, this assembly was corrected for mis-assembly errors as demonstrated by resolving a large collapsed repeat and by receiving higher scores by assembly evaluation tools. With the inclusion of PacBio data we were able to fill about 5 % of the optical mapped genome not covered by the Illumina data.