Pichia pastoris has emerged as an important alternative host for producing recombinant biopharmaceuticals, owing to its high cultivation density, low host cell protein burden, and the development of strains with humanized glycosylation. Despite its demonstrated utility, relatively little strain engineering has been performed to improve Pichia, due in part to the limited number and inconsistent frameworks of reported genomes and transcriptomes. Furthermore, the co-mingling of genomic, transcriptomic and fermentation data collected about Komagataella pastoris and Komagataella phaffii, the two strains co-branded as Pichia, has generated confusion about host performance for these genetically distinct species. Generation of comparative high-quality genomes and transcriptomes will enable meaningful comparisons between the organisms, and potentially inform distinct biotechnological utilies for each species.Here, we present a comprehensive and standardized comparative analysis of the genomic features of the three most commonly used strains comprising the tradename Pichia: K. pastoris wild-type, K. phaffii wild-type, and K. phaffii GS115. We used a combination of long-read (PacBio) and short-read (Illumina) sequencing technologies to achieve over 1000X coverage of each genome. Construction of individual genomes was then performed using as few as seven individual contigs to create gap-free assemblies. We found substantial syntenic rearrangements between the species and characterized a linear plasmid present in K. phaffii. Comparative analyses between K. phaffii genomes enabled the characterization of the mutational landscape of the GS115 strain. We identified and examined 35 non-synonomous coding mutations present in GS115, many of which are likely to impact strain performance. Additionally, we investigated transcriptomic profiles of gene expression for both species during cultivation on various carbon sources. We observed that the most highly transcribed genes in both organisms were consistently highly expressed in all three carbon sources examined. We also observed selective expression of certain genes in each carbon source, including many sequences not previously reported as promoters for expression of heterologous proteins in yeasts.Our studies establish a foundation for understanding critical relationships between genome structure, cultivation conditions and gene expression. The resources we report here will inform and facilitate rational, organism-wide strain engineering for improved utility as a host for protein production.
Journal: BMC genomics