A multiple DNA inversion system, the shufflon, exists in incompatibility (Inc) I1 and I2 plasmids. The shufflon generates variants of the PilV protein, a minor component of the thin pilus. The shufflon is one of the most difficult regions for de novo genome assembly because of its structural diversity even in an isolated bacterial clone. We determined complete genome sequences, including those of IncI2 plasmids carrying mcr-1, of three Escherichia coli strains using single-molecule, real-time (SMRT) sequencing and Illumina sequencing. The sequences assembled using only SMRT sequencing contained misassembled regions in the shufflon. A hybrid analysis using SMRT and Illumina sequencing resolved the misassembled region and revealed that the three IncI2 plasmids, excluding the shufflon region, were highly conserved. Moreover, the abundance ratio of whole-shufflon structures could be determined by quantitative structural variation analysis of the SMRT data, suggesting that a remarkable heterogeneity of whole-shufflon structural variations exists in IncI2 plasmids. These findings indicate that remarkable rearrangement regions should be validated using both long-read and short-read sequencing data and that the structural variation of PilV in the shufflon might be closely related to phenotypic heterogeneity of plasmid-mediated transconjugation involved in horizontal gene transfer even in bacterial clonal populations.
Journal: Scientific reports