Diversity in Revision Control

Posted: Tue, 13 December 2005 | permalink | No comments

Josselin Mouette has thrown down the gauntlet -- Subversion is the One True Revision Control System and we should all just Stop Worrying and Learn To Love The Handcuffs. In particular:

[...] just use subversion, and if you feel it's missing features, please contribute them. By standardizing on subversion, we lower the barrier for people wanting to contribute, instead of increasing it by asking them to install and understand new tools.

Quick thought experiment, Josselin: replace "subversion" with "Java". Or "Perl", or "C", or even "Python" or "Ruby". Or any other language. It doesn't matter which one. Even if the language you substitute is a good one, and one you happen to like, it's still a fairly obviously braindead idea -- that there can possibly be One True Programming Language for all problems, and that there can never be any improvement that can't be bolted into the existing OTPL.

Revision control isn't so much different from programming languages. Different systems will provide different advantages and disadvantages. There's also incremental and revolutionary improvement that will take place over time -- machine code to symbolic to procedural to object oriented -- that can never take place within the established framework.

I happen to really like the distributed model -- it reduces (though, naturally, cannot eliminate) the division between the Blessed and the Peons in a project (those with direct commit access vs. the rest of us). You can get disconnected operation for (almost) free.

Also, despite the assertions contained in your article, the "barrier to entry", ironically, is actually lower with a good distributed RCS than with Subversion. The barrier isn't what you think, though -- sure, using an unfamiliar RCS is troublesome, but it's a lot less hassle than learning a new language, and people do that on occasion to hack on a particularly interesting problem. The real barrier to entry is forcing anyone who isn't blessed with direct access to The One And Only Repository to use inferior development support tools.

"An example!" I hear you cry. Gladly.

This morning (literally) I got enough of a handle on the internals of Ruby on Rails to actually track down a bug that was irritating the crap out of me. Being a good developer, I wanted to use revision control to track my changes. In the All Subversion, All The Time "utopia", that would involve getting an anonymous checkout, making a copy, hacking on the copy, then (when all was done) running diff -urN between the original and the copy, and sending it all upstream with a manually-created log message (assuming I could remember everything I'd done). You also need to manually remember (or encode it in your directory or something) the source of your tree, so upstream can merge it reliably (and compare it against other equivalent work that may have been done in the meantime).

Doing it better, I could do an anonymous checkout, and then import it all into a new Subversion repository (or directory) of my very own, and hack on it in there. Benefit: revision control of my changes. But it's non-trivial to create a new repo and import everything into it, and there's still no automatic record made of the upstream source of my new repo.

Wanting to make a second change is even worse -- I either have to make a whole new copy of my source tree (disk is cheap, but this is getting silly) or just keep forging ahead with changing my existing tree -- so this second set of changes can end up being dependent on the first (which is Not So Great if upstream don't merge your first patch). In the "new repo" scenario, you can branch from the root of your new repo and continue hacking on your newly created branch, which is a good start, but it's still icky. And good luck having the system track new commits from upstream while you're working, too.

If the project was using a DRCS, in contrast, the problem would reduce to making a branch into your local archive, hacking (committing at will), updating with new upstream commits as you go along, and then publishing your changes to a public place for reintegration (or, in some cases, sending your changeset via e-mail).

Oh, and before anyone starts screaming "if anyone can branch, it will encourage forking!", consider which is more likely to result in a permanent fork (rather than a temporary independent line of development): a local set of changes in a tarball or local subversion repo, both of which are hard for anyone else to collaborate on without major pain; or a branch in a simple-to-publish (or push) designed-to-be-shared DRCS archive, which automatically keeps all of your interesting metadata intact and coherent across archives, merges, time, and space (and possibly even dimensions).

This utopia doesn't really exist, and is hampered by everybody having their own Pet DRCS, but the problem, I'm fairly sure, is transient, and my main point isn't even so much against the choice of Final Ultimate Solution, or even the problem space, but more about the attitude that everyone should get behind Any One System and stay there. I can't imagine the world of pain we'd be in if Josselin had posted his message a few years ago (but with s/subversion/CVS/i, naturally) and everyone had followed his advice -- can you imagine per-file changesets with no effective merge capability for the rest of eternity? <shudder />

Comments on this post are closed.

Brane Dump

Diversity in Revision Control