In a new paper published in the journal Gene, scientists from Rutgers University and King’s College London report the use of a single SMRT® Cell to sequence and assemble more than 400 wheat-storage protein transcripts from 10 strains of the crop.
In “PacBio sequencing of gene families — A case study with wheat gluten genes,” authors Wei Zhang, Paul Ciclitira, and Joachim Messing note that traditional studies of these cDNA sequences are so costly and labor-intensive that they have not allowed for intensive study of “the variation of each orthologous gene copy among cultivars.”
That kind of study for complex traits “usually requires positional information from sequencing entire genomes,” a task that would be prohibitive for this type of cross-strain interrogation. “Comparative transcriptome analysis of gene families,” the scientists note, offers an alternative way to study multigenic traits “without a need to re-sequence the related genomes in their entirety.”
For transcriptome sequencing, short-read technologies eliminate the cost problem, the authors add, but the short sequences “are a critical barrier to assemble repetitive genes, which may result in inadvertently joining of different gene copies into chimeric molecules.”
PacBio® sequencing, on the other hand, offers not only the needed throughput but also read lengths capable of resolving long, complex genetic regions, Zhang et al. write. The paper reports a proof-of-principle study designed to determine whether SMRT Sequencing is a viable and scalable option for investigations of variation across several different crop strains.
The authors chose 10 wheat cultivars from around the world, used barcoded PCR primers for each, and pooled the samples to run on a single SMRT Cell. Sequence data had an average read length of 3,050 bp and included nearly 33,000 circular-consensus sequencing reads in the final analysis.
The scientists then compared results of one of their cultivars, a common type of wheat known as Chinese Spring, to information on the same cultivar from the NCBI protein database, finding high rates of concordance. “The accuracy of the assembly in Chinese Spring was validated with 99% identity from cDNAs obtained by conventional sequencing methods,” they report. They also succeeded in sidestepping the chimera problem of short-read sequencers: “With the redundancy in sequencing coverage and the length of the sequences, our assemblies avoid chimeric joining of different gene copies.”
Zhang et al. note that their method should be useful for other phylogenetic studies as well. “We suggest our method as an efficient, low-cost method for profiling gene expression of gene families from cultivars, which genome has not been [sequenced] or is only available as a draft sequence,” they write.