Current methods for genome-wide analysis of gene expression require fragmentation of original transcripts into small fragments for short-read sequencing. In bacteria, the resulting fragmented information hides operon complexity. Additionally, in vivo processing of transcripts confounds the accurate identification of the 5' and 3' ends of operons. Here we develop a methodology called SMRT-Cappable-seq that combines the isolation of un-fragmented primary transcripts with single-molecule long read sequencing. Applied to E. coli, this technology results in an accurate definition of the transcriptome with 34% of known operons from RegulonDB being extended by at least one gene. Furthermore, 40% of transcription termination sites have read-through that alters the gene content of the operons. As a result, most of the bacterial genes are present in multiple operon variants reminiscent of eukaryotic splicing. By providing such granularity in the operon structure, this study represents an important resource for the study of prokaryotic gene network and regulation.
Journal: Nature communications