Thursday, June 19, 2008

Random notes on four version control systems

From 1996 to 2002, CVS was the only version control system I used. For that matter, it was almost the only system that anybody used, at least in the Unix world. (Well, a few nutty holdouts still used raw RCS...) In odd corners of the Internet, there were rumblings about other systems, but none of them had much mindshare or seemed worth the switching costs.

But somehow, over past five years, the version control universe seems to have opened up dramatically. CVS's competitors got more mature, and projects actually switched to them: Linux went to BitKeeper and git, KDE went to Subversion, etc. I'd occasionally use these systems to checkout and build source code, but I still didn't use them for actual development. Then, in 2006, I got a real job, where we use Perforce with some local wrapper scripts. More recently, I converted some of my personal codebases to Subversion and git, and started using them for real. So over the past two years, I've learned more about version control systems than I did in the previous eight.

Conclusions:

  • There's no reason for anyone, anywhere, to use CVS for new projects. Subversion is stable; it has a strict superset of CVS's abilities; it requires almost no change in your mindset if you're coming from CVS; it runs on every platform that you're likely to have on your workstation or laptop; you can get hosting from a wide variety of providers; and its frontends are roughly as good as CVS frontends. (I use KDE, and kdesvn is only slightly less polished than Cervisia.)
  • CVS to SVN import is pretty good, but not perfect. It will introduce some spurious entries into revision logs, for reasons that I find obscure, and CVS tags won't be seamless.
  • Perforce is a very nice version control system, and has done a lot to change my mind about using proprietary software development infrastructure tools. I particularly like Perforce's ability to prepare a "changelist" from a subset of files modified in your local copy, and perform actions at changelist granularity (e.g., mailing out the CL for review). My major annoyance going from Perforce to Subversion was the fact that svn commit submits all your modified files to the repository, unless you select a list of files manually; and you can't prepare that submission list in advance. (BTW, gvn is probably going to fix a lot of these problems.)
  • The ability to do really lightweight local branches and commits in git is pretty awesome. git will probably be my weapon of choice for solo projects in the future.
  • git seems to have two sweet spots: (a) solo development, where you use it as a sort of glorified Filesystem of Forking Paths, and the unstructured nature of its repositories doesn't matter so much; and (b) hugely distributed development, like the Linux kernel, where speed matters a lot and tending a centralized repository might be intractable to manage anyway. The in-between space where most collaborative projects live --- where you want a medium-sized, canonical, gatekeeper-controlled central repository with a well-defined tip-of-trunk to which everybody syncs and submits --- seems better suited to Subversion or Perforce. (Yes, you can build that workflow style with git, but you have to do the intellectual labor of setting up that workflow, whereas with Subversion or Perforce the wheels come pre-greased.)

Lastly, I've seen some debates about the merits of fully distributed versioning systems like git vs. centralized systems like Subversion or Perforce. I find this analysis fairly insightful. Workflow matters more than your VCS. Think through your desired workflow, and then build a system that supports it; your VCS will be only one part of that workflow system.

At work, we have an automated process for auditing commits, and an awesome code review tool to support that process. After about a year and a half, I'm now addicted to that process. I would find it deeply disturbing to be building "solid" software (as opposed to personal hacks) in an environment where people can commit to the repository without code review. It also seems retrograde to have people perform code review by mailing around patchfiles and having developers eyeball them, or manually apply them to their local trees. And it's obviously not scalable to have a small pool of "committers" through whom all patches must be funneled.

Yet, as far as I can tell, most open source projects run this way: a carefully tended tree with a small number of dedicated committers, who eyeball and then commit patches from the wider community. Craziness.

My ideal repository workflow system would probably be two-tiered:

  • The central repository is where you keep the canonical tree. From the trunk of this tree, you build release binaries, run a centralized continuous build/test farm, etc. The repository is divided hierarchically into modules. All developers can commit directly to this tree; however, each committed patch must be code reviewed and approved in advance by an owner of the corresponding module.
  • Individual developers keep distributed repositories, which work like git. They can manage their own branches, commit at will, etc. When a developer's ready to submit a patch to the central repository, (s)he prepares a changelist in his/her local repository, then submits it as a patch to the central repository. The central repository keeps the patch, but doesn't permit it to be merged into to the trunk until it's reviewed and approved.

Today, you can build an approximation of the above with a Subversion + git workflow. You can buy Subversion hosting from any number of providers, and you can run git on your own workstation without getting anybody else's permission or hosting resources. I suppose the missing piece is the workflow support system, but I doubt the open source community will leave this void unfilled forever.

No comments:

Post a Comment