Crucial assembly sites and mitosis mediators, centromeres are central to every cell, but missing from even the most complete genome assemblies.
Until now.
In a PLOS Biology paper, Amanda Larracuente and colleagues at the University of Rochester and Barbara G. Mellone of the University of Connecticut, described how they sequenced the repetitive regions of the fruit fly genome, including its centromeres, using SMRT Sequencing.
Embedded in blocks of highly repetitive satellite DNA, centromeres have eluded efforts at assembly.
Only recently, long-read single molecule sequencing technologies have made it possible to obtain assemblies of highly repetitive parts of multicellular genomes such as the human Y chromosome centromere and maize centromere 10. This is the first time researchers have sequenced all the centromeres in any multicellular organism.
“Our study shows that combining long-read sequencing with ChIP-seq and chromatin fiber FISH is a powerful approach to discover centromeric DNA sequences and their organization,” the authors wrote. “Our overall strategy therefore provides a blueprint for determining the composition and organization of centromeric DNA in other species.”
Drosophila melanogaster proved the ideal model to investigate centromere genomic organization, as it has a relatively small genome (roughly 180 Mb), organized in just three autosomes (chromosome 2, 3, and 4) and two sex chromosomes (X and Y). The estimated centromere sizes in Drosophila cultured cells range between 200 and 500 kb and map to regions within large blocks of tandem repeats.
It has been believed that satellites are likely the major structural elements of Drosophila, human and mouse centromeres. By tracking the histone H3 variant centromere protein A (CENP-A), the team was able to identify the fruit fly centromeres and found that they primarily occupy islands of complex DNA enriched in retroelements flanked by large blocks of simple satellites. They estimate that approximately 70% of the functional centromeric DNA of D. melanogaster is composed of complex DNA islands, which are rich in non-LTR retroelements and buried within large blocks of tandem repeats.
“They likely went undetected in previous studies of centromere organization because three of the five islands are either missing or incomplete in the published reference D. melanogaster genome … having an improved reference genome assembly is crucial for identifying centromeric DNA sequences,” the authors state.
The retroelements they found were not merely present near centromeres, but were components of the active centromere cores.
“Why retroelements are such ubiquitous components of centromeres and whether they play an active role in centromere function remain open questions,” the authors wrote.
Additional avenues worth exploring include identifying associated tandem repeats, as well as mapping the span of the CENP-A domain and its binding sites.
“Knowing the identity of D. melanogaster centromeric DNA will enable the functional interrogation of these elements in this powerhouse model organism,” the authors wrote.
August 7, 2019 | Plant + animal biology