Oklahoma Scientists Use SMRT Sequencing to Rescue Fungal Genome Assembly
Tuesday, August 27, 2013
|Orpinomyces is found in cattle rumen.|
Scientists from Oklahoma State University and the University of Oklahoma teamed up with a sequencing service provider to study the genome of an anaerobic fungus found in the rumen of cows that may have implications for effective plant biomass degradation. What made this particular species so tricky to sequence were its extreme GC content — just 17 percent — and unusually high number of repeats.
The study was reported in “The Genome of the Anaerobic Fungus Orpinomyces sp. Strain C1A Reveals the Unique Evolutionary History of a Remarkable Plant Biomass Degrader,” a paper published in this month’s edition of the ASM journal Applied and Environmental Microbiology. The C1A isolate had a large genome: just over 100 Mb, with more than 16,000 genes.
Senior author Mostafa Elshaheda and his team sequenced the fungus, Orpinomyces sp. strain C1A, using both Illumina® and PacBio® technologies. They report that the organism’s extremely low GC content of just 17 percent is the lowest seen of any free-living microbe sequenced to date. Other unusual traits of the genome were its “relatively large proportion of noncoding intergenic regions,” comprising some 73 percent of the sequence, and high number of simple sequence repeats, which saw “massive proliferation” in the noncoding regions. These repeats, mostly homopolyer As or Ts, made up nearly 5 percent of the genome; the authors point out that this is at least an order of magnitude higher than repeat numbers reported in other fungal genomes.
These remarkable insights were attained by a two-part attempt to sequence the organism’s genome. As described in the paper, the team initially used paired-end sequencing on the Illumina platform to generate an assembly with 290-fold coverage that was “highly fragmented … with an extremely large number of contigs in the final assembly (82,325 contigs), a large proportion of the final assembly (32.4%) harbored in extremely short contigs (300 to 900 bp), and a low N50 (1,666 bp).”
So Elshaheda et al. turned to SMRT® Sequencing, generating about 10-fold coverage of the C1A isolate. PacBioToCA was used to produce a hybrid assembly of the fungus that had an average quality score of 59.7. “The final assembly was a marked improvement compared to the Illumina-only assembly, as evident from the improved N50/N90 values” and other characteristics, the authors write. They note that the long PacBio reads “allowed for the identification of a large number of introns previously undetected in the Illumina assembly.”
Armed with this sequence data, Elshaheda and his team performed a number of follow-up and functional studies on C1A. They found the organism to be a “remarkable biomass degrader,” and in tests of several different plant materials — including switchgrass, corn stover, alfalfa, and more — C1A proved quite versatile, “able to metabolize all different types of examined plant biomass.” This trait makes the organism a particularly promising candidate for use in plant bioprocessing required for the production of many biofuels, they add.