Many investigators rely on targeted sequencing approaches for deep dives into genomic regions of interest. By designing specific probes — often using short-read sequences directed towards the exome and supported by existing reference genomes or transcriptome assemblies — scientists can home in on exactly the area they want to explore.
But what about sequences in intergenic regions not covered by short reads, which could contain crucial regulatory elements varying between populations that might be of functional and evolutionary importance? Or, what about species lacking high-quality reference genomes to guide probe design?
A team of Norwegian researchers are tackling these challenges using PacBio long-read sequencing technology for their target capture experiments. In a pre-print posted on bioRxiv, corresponding author Sissel Jentoft, first author Siv Nam Khang Hoff, and colleagues at the University of Oslo, Roche NimbleGen, and Roche Diagnostics, describe how they used the technique to elucidate the evolution of the hemoglobin gene clusters in codfishes.
Hemoglobins (Hbs), key respiratory proteins in most vertebrates, are of great importance for ecological adaptation in fishes, as environmental factors such as temperature directly influence the solubility of O2 in surrounding waters and the ability of Hb to bind O2 at respiratory surfaces.
Previous studies have suggested remarkably high Hb gene copy number variation between codfish species. One study, for example, reported a negative correlation between the number of Hb genes and depth at which the species occur was observed, suggesting that the more variable environment in sunlit waters has facilitated a larger and more diverse Hb gene repertoire.
Interested in resolving the organization of Hb genes and their flanking genes in a selection of codfishes inhabiting different environmental conditions, the Oslo team turned to SMRT Sequencing to generate long, highly accurate, and continuous assemblies of these specific genomic regions of interest.
“Comparative genetic studies of gene organization or synteny requires longer, more continuous stretches of DNA containing more than one gene,” the authors explain.
Eight codfish species were selected on the basis of phylogenetic and habitat divergence. A highly continuous genome assembly of Atlantic cod (previously created using PacBio sequencing), as well as low-coverage draft genome assemblies of all eight species were used to design probes spanning both exons and introns of the genomic regions of interest. To enable targeted sequence capture for PacBio sequencing, the team used a modified protocol for sequence capture offered by Roche NimbleGen (the SeqCap EZ protocol) and generated custom barcodes.
“The generation of highly continuous assemblies enabled reconstruction of micro-synteny revealing lineage-specific gene duplications and identification of a relatively large and inter-species variable indel located in the promoter region between the Hbb1 and Hba1 genes,” the authors write.
The results shed light on the evolutionary history of Hb genes across species separated by up to 70 million years of evolution, and reveal genetic variations possibly linked to thermal adaptation, they conclude.
“Our study demonstrates that this approach… is a highly efficient and versatile method to investigate specific genomic regions of interest across distantly related species where genome sequences are lacking,” they add.
For pointers on how you can use SeqCap EZ for target sequence capture on PacBio Systems, check out this protocol.
May 14, 2018 | Plant + animal biology