The ability to study the speciation of an animal in real-time is a dream come true for evolutionary and developmental biologists. A group of Japanese researchers has gotten that opportunity, thanks in part to SMRT Sequencing.
Scientists at the University of Tokyo were the first to create a reference genome for an inbred strain of the medaka fish (Oryzias latipes), genome size ~800 Mb, in 2007. The genome assembly was created using Sanger sequencing, but contained low-quality regions and 97,933 sequence gaps. So, the team started from scratch with long-read sequencing to generate genome assemblies with far less missing sequence.
In a paper published in Nature Communications, senior authors Hiroyuki Takeda and Shinich Morishita report new assemblies generated via PacBio long-read sequencing from three geographically isolated medaka strains. These high-quality assemblies allowed them to dive deeper than ever before into the genetics of the fish, and to discover new insights into how previously difficult-to-detect centromeres and large-scale structural variants evolve and contribute to genome diversity during vertebrate speciation.
“Highly accurate long contigs have been useful in enumeration of structural variants (SVs), filling gaps such as centromeres, extending contigs to telomeres, and phasing haplotypes,” the authors write.
The team focused its attention on centromeres, which are difficult to sequence and assemble with short-read and even Sanger platforms. “Once speciation is completed, representative centromeric monomers are highly diversified among 282 species; however, centromere evolution during speciation and its relevance with speciation are unknown,” the authors note.
With this in mind, the team sequenced the genomes of three medaka inbred strains derived from different local subpopulations: HNI from northern Japan, Hd-rR from southern Japan (the strain sequenced for the original reference genome), and HSOK from east Korea.
Originally considered a single species since they can mate and produce healthy offspring under laboratory conditions, the strains have accumulated genetic mutations and phenotypic diversity over a long period of geographical separation. They are now thought to be in the middle of speciation, making them the perfect platform for analyzing this type of evolution, the authors report.
Combining PacBio data with centromere-specific DNA probes and fluorescence in situ hybridization experiments, the team reports obtaining “an unprecedented resource of centromeric repeats of length 20–345 kbp in vertebrates.”
They found that the position of centromeres tended to be preserved unless chromosomal rearrangement took place on a large scale. This happened to the medaka, which remained the same for millions of years, until fissions, fusions, and translocations shaped its genome.
The scientists further discovered that this evolution happened at a different pace among the three strains, depending on the shape and sequence of the centromeres. Centromeric monomers in acrocentric chromosomes evolved more slowly than those in non-acrocentric chromosomes, the team reports. Using AgIn software, the authors estimated methylation states of CpG sites from kinetic SMRT Sequencing information and found divergent methylation patterns, suggesting that centromeres accumulate epigenetic diversity as well as sequence diversity during speciation.
They observed that each local strain has independently experienced thousands of mid-sized (1-50 kbp) insertion events—not enough to cause reproductive isolation, but possibly enough to participate in the regulation of genes and contribute to phenotypic variations.
“These findings reveal the potential of non-acrocentric centromere evolution to contribute to speciation,” conclude the authors. “Further analysis of the mid-sized insertions associated with novel transcripts and increased transcription will provide important clues to the genomic basis for vertebrate speciation.”
December 14, 2017 | General