Tech Posts

Document Versioning

One of the single most useful things that I’ve learned since studying a Computer Science degree, is about versioning software. I’ve used a number of different flavours of versioning software and I am still amazed that this is not part of the general consciousness given its usefulness.

One of the first versioning software packages that I came across was MediaWiki – the software developed and used by Wikipedia. While not trivial to set up, and not particularly full-featured (my biggest gripes are no wysiwyg editing and no (native) approved version tagging), it certainly does allow for very powerful historical tracking and, with the help of some additional plugins, can be made to fulfil most requirements. One remaining problem is that there are no currently supported wysiwyg editors for the most recent release of the software.

I have used MediaWiki (successfully) for managing policy documentation within my work environment and have used that to also make those policies available to the general public. Editing can go on without worrying too much about the front-facing result since visitors are shown the last approved version and not the current draft version. Less successfully, I have attempted to use it as a network documentation system, but there are a few too many security issues for my liking.

To solve the security issue, I’ve moved a lot of the technical documentation regarding our school network setup to Google Drive, from our Google Apps for Education system where the Google Docs function does allow for revision tracking. There is not, as far as I can tell, a way to tag the approved version of a document while the most current one undergoes revision. Thus while this provides perfect editing capabilities for something like the school policy documents, it is not a great mechanism with which to publish them and so introduces an entire other step to publish approved versions of the document. Also, reverting changes to the draft can be difficult since there is no easy means to find which the last “official” revision was in a frequently changing document.

The sharing and security aspects of Google Drive allows for significantly tighter control of who can see what.

There are, of course, document management systems whose job it is to manage documents and their versions. Documents are checked in and out (which reduces collaborative editing) and other versions which facilitate this, such as Microsoft’s SharePoint, require a significant outlay of expertise and licenses to set up a system to get it to work. What these systems do allow, however, is a far more fine-grained and standards-compliant control over documents and their revision process, but in my experience, the process becomes too complicated to manage and requires an in-house document management department. Given that we can’t afford, nor require, a department to manage documents, this becomes a significant overhead in terms of cost and effort.

Finally, versioning software reaches an entirely new level in the form of code versioning software, such as Mercurial, Git or Subversion (the latter falling out of favour given its limitations in team deployments). Certainly, these packages can be used for versioning any type of document, but the amount of expertise that is needed to understand branches, tags and navigate the interface software (I prefer TortoiseHG – although I do use some automated batch commands for certain procedures – like getting a list of all the files that have changes since the last “release” version) puts this out of reach for a casual computer user. This is despite it being potentially the answer to any range of versioning requirements – including web publishing of official policy documents (that requires an “update to” the latest official release – and all documents that have been changed will reflect the updated version.

There are some that have suggested using the likes of Mercurial or Git as a more general versioning strategy and will run automated “commits” (version saving) in the background. These can also be “pushed” to online repositories and downloaded to other PCs if necessary. Each commit is typically very small given that it only saves the differences, and so downloading a day or even a week’s worth of file changes and versions does not take particularly long. Using this kind of setup, all the associated computers have a complete “time machine” of all the file changes that have been made. This obviously comes at a cost of disk space, but probably less than you think.

I strongly recommend that anyone who is involved in coding or managing documents of some form, investigate versioning software carefully. With the likes of Mercurial, Git and Subversion, be prepared to think and get your head around it.

It’s worth it when you do.