Thursday, June 19, 2008
Random notes on four version control systems
From 1996 to 2002, CVS was the only version control system I used. For that matter, it was almost the only system that anybody used, at least in the Unix world. (Well, a few nutty holdouts still used raw RCS...) In odd corners of the Internet, there were rumblings about other systems, but none of them had much mindshare or seemed worth the switching costs.
But somehow, over past five years, the version control universe seems to have opened up dramatically. CVS's competitors got more mature, and projects actually switched to them: Linux went to BitKeeper and git, KDE went to Subversion, etc. I'd occasionally use these systems to checkout and build source code, but I still didn't use them for actual development. Then, in 2006, I got a real job, where we use Perforce with some local wrapper scripts. More recently, I converted some of my personal codebases to Subversion and git, and started using them for real. So over the past two years, I've learned more about version control systems than I did in the previous eight.
Conclusions:
- There's no reason for anyone, anywhere, to use CVS for new projects. Subversion is stable; it has a strict superset of CVS's abilities; it requires almost no change in your mindset if you're coming from CVS; it runs on every platform that you're likely to have on your workstation or laptop; you can get hosting from a wide variety of providers; and its frontends are roughly as good as CVS frontends. (I use KDE, and kdesvn is only slightly less polished than Cervisia.)
- CVS to SVN import is pretty good, but not perfect. It will introduce some spurious entries into revision logs, for reasons that I find obscure, and CVS tags won't be seamless.
- Perforce is a very nice version control system, and has done a lot to change my mind about using proprietary software development infrastructure tools. I particularly like Perforce's ability to prepare a "changelist" from a subset of files modified in your local copy, and perform actions at changelist granularity (e.g., mailing out the CL for review). My major annoyance going from Perforce to Subversion was the fact that svn commit submits all your modified files to the repository, unless you select a list of files manually; and you can't prepare that submission list in advance. (BTW, gvn is probably going to fix a lot of these problems.)
- The ability to do really lightweight local branches and commits in git is pretty awesome. git will probably be my weapon of choice for solo projects in the future.
- git seems to have two sweet spots: (a) solo development, where you use it as a sort of glorified Filesystem of Forking Paths, and the unstructured nature of its repositories doesn't matter so much; and (b) hugely distributed development, like the Linux kernel, where speed matters a lot and tending a centralized repository might be intractable to manage anyway. The in-between space where most collaborative projects live --- where you want a medium-sized, canonical, gatekeeper-controlled central repository with a well-defined tip-of-trunk to which everybody syncs and submits --- seems better suited to Subversion or Perforce. (Yes, you can build that workflow style with git, but you have to do the intellectual labor of setting up that workflow, whereas with Subversion or Perforce the wheels come pre-greased.)
Lastly, I've seen some debates about the merits of fully distributed versioning systems like git vs. centralized systems like Subversion or Perforce. I find this analysis fairly insightful. Workflow matters more than your VCS. Think through your desired workflow, and then build a system that supports it; your VCS will be only one part of that workflow system.
At work, we have an automated process for auditing commits, and an awesome code review tool to support that process. After about a year and a half, I'm now addicted to that process. I would find it deeply disturbing to be building "solid" software (as opposed to personal hacks) in an environment where people can commit to the repository without code review. It also seems retrograde to have people perform code review by mailing around patchfiles and having developers eyeball them, or manually apply them to their local trees. And it's obviously not scalable to have a small pool of "committers" through whom all patches must be funneled.
Yet, as far as I can tell, most open source projects run this way: a carefully tended tree with a small number of dedicated committers, who eyeball and then commit patches from the wider community. Craziness.
My ideal repository workflow system would probably be two-tiered:
- The central repository is where you keep the canonical tree. From the trunk of this tree, you build release binaries, run a centralized continuous build/test farm, etc. The repository is divided hierarchically into modules. All developers can commit directly to this tree; however, each committed patch must be code reviewed and approved in advance by an owner of the corresponding module.
- Individual developers keep distributed repositories, which work like git. They can manage their own branches, commit at will, etc. When a developer's ready to submit a patch to the central repository, (s)he prepares a changelist in his/her local repository, then submits it as a patch to the central repository. The central repository keeps the patch, but doesn't permit it to be merged into to the trunk until it's reviewed and approved.
Today, you can build an approximation of the above with a Subversion + git workflow. You can buy Subversion hosting from any number of providers, and you can run git on your own workstation without getting anybody else's permission or hosting resources. I suppose the missing piece is the workflow support system, but I doubt the open source community will leave this void unfilled forever.
Labels: programming
Monday, May 19, 2008
SML hacking tip: installing on Ubuntu x86_64 by manual transfer from i386
Note: Narrowly targeted Google-food. Skip if you do not program in Standard ML.
Ubuntu x86_64 does not have smlnj (the Standard ML of New Jersey (SML/NJ) distribution); nor does SML/NJ currently build out-of-the-box on Ubuntu x86_64. I elide here a long, dull story about trying to build it --- using 32-bit libraries, etc. --- and cut to the chase: ultimately, I installed it on a 32-bit Ubuntu box, and then manually transferred the files over.
In general, I almost never install software on my Linux box unless it's either (a) completely managed by my packaging system or (b) can be removed by simply rm -Rf /some/directory. Fortunately, SML/NJ more or less satisfies (b).
Assuming you have both a 32-bit Ubuntu install and a 64-bit Ubuntu install available, the transfer process is straightforward. The only trickiness is that there's no simple way to get all SML/NJ packages and libraries at once, so you'll have to apt-get several times if you want a "batteries included" installation.
Steps (unless otherwise noted, perform all these commands on your 32-bit machine):
sudo apt-get install smlnj- Optionally, do
apt-cache search smlnjand install as many of the resulting lib*-smlnj libraries as you want. - Optionally, install the following extra packages:
- ml-lpt ("language processing tools"; includes ml-ulex and ml-antlr)
- ml-yacc (ML-Yacc)
apt-cache search mlyields a huge fusillade of results because "ml" is a common substring. Couldn't they at least use sml-* or better yet standard-ml-*? Blech.) cd /usr/libtar -czvf smlnj.tar.gz smlnj- Copy the tarball smlnj.tar.gz to your target machine of choice and untar it under /usr/lib
pushd /usr/bin && tar -czvf ~/smlnj-scripts.tar.gz `dpkg -L smlnj smlnj-runtime ml-lpt ml-yacc |cut -b 10-` && popd- Copy the tarball smlnj-scripts.tar.gz to your target machine, and untar it under a binary directory on your PATH. (I have $HOME/bin in my PATH for user-local executables, so I put it there.)
As far as I can tell, everything works.
Labels: programming, standard-ml
Saturday, May 17, 2008
SML hacking tip: fix uncaught exception BadAnchor
Note: Narrowly targeted Google-food. Skip if you do not program in Standard ML.
If you use the Standard ML of New Jersey (smlnj) compilation manager (CM), then you will sometimes get error messages like this:
[bad plugin name: anchor $y-ext.cm not defined]
uncaught exception BadAnchor
raised at: ../cm/paths/srcpath.sml:436.16-436.25
../cm/util/safeio.sml:41.55
../cm/util/safeio.sml:41.55
../cm/util/safeio.sml:41.55
../cm/parse/parse.sml:502.47
/usr/lib/smlnj/bin/sml: Fatal error -- Uncaught exception BadAnchor with 0
raised at ../cm/paths/srcpath.sml:436.16-436.25
make: *** [default] Error 1
This is CM's way of telling you that it encountered a line in your *.cm that has an extension that it does not understand. In this case, it's a file with a .y extension.
The fix is to correct the file extension (if it was a typo), or perhaps to update to a version of CM that understands the file extension you're using.
In my case, I had been hacking with an ages-old version of SML/NJ, and recently updated to the version in Ubuntu 8.04 (a.k.a. Hardy). Apparently non-ancient versions of SML/NJ expect ML-Yacc files to be named with the extension .grm instead of .y. I renamed my Yacc file, updated sources.cm, and all was well.
Labels: programming, standard-ml
Wednesday, May 17, 2006
SML hacking tip: Turn off polyEqual warnings
Note: Narrowly targeted Google-food. Skip if you do not program in Standard ML.
Recent versions of Standard ML of New Jersey (SML/NJ) print a message "Warning: calling polyEqual" when you write code that uses polymorphic equality. Here's how to turn it off:
sml -Ccontrol.poly-eq-warn=false
This works in CM mode, as in:
sml -Ccontrol.poly-eq-warn=false -m sources.cm
I am posting this here as Google-food because I just spent an hour Googling and greping around in SML/NJ sources trying to figure this out.
If you're using SML/NJ interactively, then you can also type
Control.polyEqWarn := false
into the read-eval-print loop. However, this doesn't work when you're trying to invoke the SML Compilation manager in "make" mode (-m), because Control is not present in the default linkage environment. (And the SML/NJ documentation does not specify how to add it; or, at least, I haven't figured it out yet.)
More generally, all control flags (all bool refs in Control) can be toggled at the command line. This is documented in the command-line section of the SML Compilation Manager manual. There, we learn that -C can be used to set control parameters. You can get a listing of all control parameters using -S, as follows:
sml -S
Finally, I just want to remark in passing that unlike, say, "match redundant" or "match nonexhaustive", use of polymorphic equality is purely a performance problem, not a probable logic error. It's therefore highly questionable design to enable a warning message for polymorphic equality by default. The default setting should have been off, but available as a profiling/debugging option.
Labels: programming, standard-ml
