A majority of facioscapulohumeral muscular dystrophy (FSHD) is caused by contraction of macrosatellite repeats called D4Z4 that are located in the subtelomeric region of human chromosome 4q35. Sequencing the FSHD locus has been technically challenging due to its long size and nearly identical nature of repeat elements. Here we report sequencing and partial assembly of a BAC clone carrying an entire FSHD locus by a single molecule real time (SMRT) sequencing technology which could produce long reads up to about 18 kb containing D4Z4 repeats. De novo assembly by Hierarchical Genome Assembly Process 1 (HGAP.1) yielded a contig of 41 kb containing all but a part of the most distal D4Z4 element. The validity of the sequence model was confirmed by an independent approach employing anchored multiple sequence alignment by Kalign using reads containing unique flanking sequences. Our data will provide a basis for further optimization of sequencing and assembly conditions of D4Z4.
Journal: PloS one