The DNA sequence of the two human mucin genes MUC2 and MUC6 have not been completely resolved due to the repetitive nature of their central exon coding for Proline, Threonine and Serine rich sequences. The exact nucleotide sequence of these exons has remained unknown for a long time due to limitations in traditional sequencing techniques. These are still very poorly covered in new whole genome sequencing projects with the corresponding protein sequences partly missing. We used a BAC clone containing both these genes and third generation sequencing technology, SMRT sequencing, to obtain the full-length contiguous MUC2 and MUC6 tandem repeat sequences. The new sequences span the entire repeat regions with good coverage revealing their length, variation in repeat sequences and their internal organization. The sequences obtained were used to compare with available sequences from whole genome sequencing projects indicating variation in number of repeats and their internal organization between individuals. The lack of these sequences has limited the association of genetic alterations with disease. The full sequences of these mucins will now allow such studies, which could be of importance for inflammatory bowel diseases for MUC2 and gastric ulcer diseases for MUC6 where deficient mucus protection is assumed to play an important role.
Journal: Scientific reports