Alif Wahid

Primitive thoughts on tracking changes in a document

The change tracking feature of MS Word is a real mixed bag for me. I mostly understand (and thoroughly appreciate) the algorithmic trickery involved in implementing such a feature so that my thoughtless opinion about it is positively biased. But whenever I actually put it through its paces in a thoughtful work-flow that requires merging two or more versions of a document, I realise just how clumsy and needlessly counter-intuitive it is. Therein lies a dilemma that I want to resolve.

The process of merging two documents in order to derive a third one is based on the formal concept of edit distance. Imagine two primitive operations that can be performed on a document: insert and delete. You can insert any number of letters into a document and/or delete any number of letters from a document in any interleaved sequence. If you take two adjacent versions of a document, say X and Y, then there exists a shortest sequence of insertions and deletions for transforming X into Y, and vice versa. The proof is not hard but editing mathematics on Tumblr is too hard.

The length of this sequence of operations is a quantitative measure of the difference between the two versions. This happens to be powerful enough so that it is a proper metric satisfying triangle inequality in the presence of a third version, which means that it can be used to construct at least a partially ordered topology of a document’s history. In practical terms, the process of merging two documents is reduced to primitive operations like inserting and deleting characters in an interleaved sequence, whereby it is guaranteed that one can always transform a document within a finitely efficient number of steps (i.e., polynomial running time). Then the pertinent question is who should perform all of these steps all of the time - computers or users?

I’m inclined to think that it’s the computer’s job to do all those tedious comma insertions and typo deletions and all other trivial steps of editing. To be fair, this is possible by telling Word to accept and apply all of the changes between two versions of a document. But more often than not, Word gets it utterly wrong and mangles the document to the extent that I have to revise it from start to finish out of shear distrust. Hence, the counter-productiveness and the counter-intuitiveness of thinking that this is a positively useful feature. It is not! Rather, it is time consuming and annoying, to say the least.

The operations that I would prefer to perform myself are much less primitive, since the computer is much better at doing primitive stuff. For instance, I want to view the semantic difference between two documents in terms of chapters, sections, paragraphs, figures, tables etc. Not commas, spaces, carriage returns, typos and the rest. Thereofore, I want a formal conception of semantic distance instead of edit distance such that I can operate at a higher level of abstraction which is less tedious and more productive. I wonder if such a formalism exists already? Any pointers?

I guess it necessarily requires setting out the structure of the document in some standard form since the semantic meta-data cannot be legible to a computer otherwise. A book is a good standard structure. It’s usually organised into chapters with short headings, sections within chapters, sub-sections within sections, and so on. Thus the difference between two adjacent versions/editions of a book ought to be expressable in the form of this hypothetical semantic distance that I’m postulating.

I think it must be intuitively far less tedious to view (and merge) two paragraphs displayed side by side as opposed to the differing characters within them, which generate the edit distance. By extension of this structural analogy, it must also be intuitively far less tedious to view the difference in two tables of contents, side by side, in order to quickly sense the overall semantic distance between them. So I suspect that this conception will most likely be a hierarchical one at different levels of semantic abstraction whereas the formalism for edit distance is necessarily flat at the syntactical level of individual characters. I have to ponder some more before the dilemma might go away, although the general idea seems sound thus far.