Monday, November 30, 2009

On the changing nature of software ownership; with a digression on horniness and the troubling interdependence of human beings

Sometime in the late 1990's, I realized that "ownership" of software had become meaningless.

Well, OK, I didn't grasp the full implications at the time. But the seed was planted when I was downloading the third or fourth or tenth macro virus protection patch for Microsoft Word 95 or Word 97. Maybe it was for my family's computer, or a friend's; or maybe it was for mine. I don't remember exactly.

However, I do remember being troubled by the implications of what I was doing. Ownership carries connotations of permanence, and the reassurance of physical possession: You own this, so no one can take this away from you. And I could indeed purchase a disc, containing a piece of software, which no one could take away from me. But this software had bugs, which made it vulnerable to security exploits; and I still needed to open files created by other people. This, in turn, implied that I needed to continue downloading updates for as long as I used the software. Even if I saved all updates available at any given time onto another disc, I would still be vulnerable to new exploits discovered later unless I returned to the mothership for more.

Therefore, unless I was willing to live a completely isolated existence, never accepting anyone else's data, I would always be dependent on the software's manufacturer.

And how long would they continue to provide updates? The answer was clearly "not forever". A few years later, a more precise answer came, in the form of Microsoft's Support Lifecycle Policy, which guarantees security patches for a certain number of years depending on the type of the product.

So, when I "bought" software, I was really paying for an implicit, limited-time support contract with the manufacturer. Without this ongoing relationship, my software would rot to the point of uselessness, as surely as a tomato on a blighted vine.

And the turn of the century also brought a new and particularly blighted vine for your software to rot on: the broadband Internet connection. Whereas before, viruses spread through casual sharing of floppy disks and occasionally file shares on your local network, a broadband Internet connection literally exposes your machine continuously to the cleverest and most malicious hacker in the entire world. And so it is that today, a vulnerable machine becomes infected within minutes of being connected to the Internet (on a non-firewalled network).

Lest the Mac or Free Software weenies (n.b. I am both) start gloating about the inferiority of Microsoft software, I hasten to point out that every nontrivial software system has a similar update schedule. How often must Ubuntu users run sudo apt-get update? How often does Firefox issue a point release? How often do Mac or iWhatever users get told to run Apple Software Update?

Therefore, software updates are an inescapable, relentless fact of Internet-connected life.

And security exploits are just the most obvious and unavoidable source of bit rot. Consider what happens when the organization — company or community, formal or informal — behind your favorite software decides to move on to the next version of a data format or protocol, and you decline to upgrade. New web pages stop being viewable; you stop being able to watch videos, play music, read documents, chat with friends, or really do anything involving other people.

Of course, you can choose to live like a virtual hermit, never sharing anything with anyone. But this comes with the same drawbacks as being a hermit in the physical world. Gains from trade and specialization of labor aren't just abstractions. Buying a jug of milk at the grocery store takes far less effort than raising and milking a cow. In some cases the difference isn't even a matter of effort but of possibility: it is impossible for an individual to manufacture an airplane, or even something as humble as an aluminum can, without relying on a massive infrastructure of manufacturing, transport, and trade provided by society. And likewise it is impossible for you to singlehandedly fork the Firefox source code, in any nontrivial way, if Mozilla takes it in a direction you don't like. (You might be able to publish an initial fork, but there's no way you'd be able to keep up with Mozilla; eventually they'd be adding critical features faster than you, and your fork would fall into the dustbin of history.)

And then there's the fact that your hardware will wear out or go obsolete, and your software will need to be ported to new hardware.

In short, your software lives in a continuously changing ecosystem of data. And you are inescapably dependent on other people to evolve that software in response to environmental changes in that ecosystem.

Therefore, while you may still "own" your software in some formal sense, that ownership means little in any practical sense. What matters is the organization that maintains your software, and your ongoing relationship to them. You're not (merely) paying for access to a passive bundle of data; you're actually making a calculated bet on the future behavior of a group of people.

At this point I'll show my colors and claim that, in the long run, open source projects with open governance structures offer more credible maintenance promises than proprietary systems. But that's a subject for a whole other set of posts. In the meantime, I offer two final nuggets, in the way of provoking thought.

First, yes, this post is a stealth followup to my previous post on ChromeOS punditry.

Second, I offer the following excerpt from Neal Stephenson's Cryptonomicon, concerning autistic World War II cryptographer and Naval xylophonist Lawrence Waterhouse:

Waterhouse has been chewing his way through exotic Nip code systems at the rate of about one a week, but after he sees Mary Smith in the parlor of Mrs. McTeague's boarding house, his production rate drops to near zero. Arguably, it goes negative, for sometimes when he reads the morning newspaper, its plaintext scrambles into gibberish before his eyes, and he is unable to extract any useful information.

Despite his and Turing's agreements about whether the human brain is a Turing machine, he has to admit that Turing wouldn't have too much trouble writing a set of instructions to simulate the brain functions of Lawrence Pritchard Waterhouse.

Waterhouse seeks happiness. He achieves it by breaking Nip code systems and playing the pipe organ. But since pipe organs are in short supply, his happiness level ends up being totally dependent on breaking codes.

He cannot break codes (hence, cannot be happy) unless his mind is clear . . . Clarity of mind (Cm) is affected by any number of factors, but by far the most important is horniness, which might be designated by σ, for obvious anatomical reasons that Waterhouse finds amusing at this stage of his emotional development.

Horniness begins at zero at time t = t0 (immediately following ejaculation) and increases from there as a linear function of time:

σ α (t - t0)

The only way to drop it back to zero is to arrange another ejaculation.

. . .

Now, when he was at Pearl Harbor, he discovered something that, in retrospect, should have been profoundly disquieting. Namely, that ejaculations obtained in a whorehouse (i.e., provided by the ministrations of an actual human female) seemed to drop σ below the level that Waterhouse could achieve through executing a Manual Override. In other words, the post-ejaculatory horniness level was not always equal to zero, as the naive theory propounded above assumes, but to some other quantity dependent on whether the ejaculation was induced by Self or Other: σ = σself after masturbation but σ = σother upon leaving a whorehouse, where σself > σother, an inequality to which Waterhouse's notable successes in breaking certain Nip naval codes at Station Hypo were directly attributable, in that the many convenient whorehouses nearby made it possible for him to go somewhat longer between ejaculations.

. . .

If he had thought about this, it would have bothered him, because σself > σother has troubling implications . . . If it weren't for this inequality, then Waterhouse could function as a totally self-contained and independent unit. But σself > σother implies that he is, in the long run, dependent on other human beings for his mental clarity, and, therefore, his happiness. What a pain in the ass!

I leave the connective tissue between the above passage and the rest of this post as an exercise for the reader.

Tuesday, November 24, 2009

On "tricks" and science

Are people really claiming that the word "trick", when used by climate scientists to describe a data analysis technique via a mailing list for other climate scientists, indicates some nefarious conspiracy of deception? Why yes, they are, and Acephalous has the rebuttal:

Global warming skeptics are attacking climate scientist Phil Jones for encouraging trickery in an email recently stolen off the webmail server at the University of East Anglia in which he wrote:

I've just completed Mike's Nature trick of adding in the real temps to each series for the last 20 years (ie from 1981 onwards) amd from 1961 for Keith's to hide the decline.

Over at RealClimate, the skeptical response to the word "trick" is to treat it as a colloquial:

Trick:
“a cunning or deceitful action or device; “he played a trick on me”; “he pulled a fast one and got away with it”
“Something designed to fool or swindle; ”
“flim-flam: deceive somebody; “We tricked the teacher into thinking that class would be cancelled next week”

. . .

Schmidt obliges:

. . . It's mostly used in mathematics, for instance in decomposing partial fractions, or deciding whether a number is divisible by 9 etc.etc.etc.

The skeptics rejoinder:

This is nonsense. Both are examples of teaching or explaining concepts to lay people. The first intentionally places “tricks” in quotations marks to emphasize its non-technical use.

The problem with nonspecialists reading the private correspondence of experts is that their ignorance transforms all the technical points into nefarious inkblots. To continue with the example above, skeptical nonspecialists encounter the word "trick" and ask for clarification. Schmidt provides evidence that the word is innocuous, but because nonspecialists can interpret neither the context of the original nor that of the further examples, they redouble their efforts: now the rhetorical situation in which the word "trick" is uttered matters; now the appearance of quotation marks matters, etc. They are convincing themselves that those black blobs represent what they insist they represent, and when experts inform them that those are not Rorschach blots to be subjectively interpreted—that they are, in fact, statements written in a language that skeptics simply do not understand—the nonspecialists look over them again and declare that it could be a butterfly, or maybe a bat.

For the programmers out there, this is a little like finding a comment in some random piece of program source code

// Hack: just disassemble the whole tree for now

and concluding that the author of the software is attempting to "hack" your password.

The word "trick", like the word "hack", is a term of art with an esoteric meaning different from its lay meaning. And no, "trick" does not denote either a deception or a way of "teaching concepts to lay people". See, for example, the tricks on Terence Tao's blog. Can anybody seriously believe that this amplification trick is "a way of explaining math to laypeople"? Well, I guess they can, if they're shooting their mouths off without having the least fucking clue what they're talking about.

So, OK, what does trick mean? Well, C. Shalizi's review of Tao's book gives a reasonable definition:

Tao’s third theme is tricks: patterns of establishing results that replicate across many situations, but in which any one result is too small to be a theorem in its own right, while the general pattern is too vague. These are an important part of how math actually gets done, but by their nature they tend not to have a recognized place in the curriculum, getting passed down by oral tradition, or by being absorbed by those who are lucky enough not only to run across a paper using the trick, but also to guess that it will generalize. There are numerous tricks throughout the book, and one of the nicest chapters, 1.9, expounds a family of tricks for improving inequalities, which Tao calls amplification.

Although, to be more precise, the climate scientists in question do not seem to be using the word "trick" in its narrowest mathematical sense, but rather in the more general sense of a useful technique — however, again, usually one too small to be publishable on its own [0] [1].


Incidentally, this sort of irresponsible misreading shows how little most climate change "skeptics" on the Internet know about math or science. Why the heck would I lend credence to people who don't know jack about scientific practice, and cannot be bothered to learn, and still feel comfortable dismissing thousands of working scientists as frauds? And especially why would I believe them over scientific professionals who have dedicated their lives to studying the data?

I mean, yesterday I got a comment on my previous post on climate change denialism saying:

Models that can exhibit large errors tend to exhibit them in the direction that the modellers would prefer. That's why real sciences use double-blind testing.

Anyone who doesn't know what results climate "scientists" are looking for isn't paying attention.

"Real" sciences use "double-blind testing"? O RLY? No wai! Blind testing, of either the single- or double- variety, is irrelevant to most experiments in the natural sciences. I would like to know what "double-blind" even means for a microarray assay; are we not informing the microscopic dots of DNA whether we've hybridized them or not? Are we getting them to sign little microscopic release forms and giving them little microscopic placebo pills?

If most researchers in the natural sciences even had to single-blind every experiment they do, they'd never get anything done. Natural science labwork is often incredibly laborious and grad students do not have time to, like, close their eyes and spin a little roulette wheel of test tubes every time they pipette a drop of reagent. The usual way to weed out observer bias is by (1) designing experimental methods which are relatively robust to observer influence; (2) repeating your experiments; and (3) describing a procedure in sufficient detail for other sufficiently trained people to reproduce the result. Blind experimentation is reserved for certain types of experiments where observer or subject bias is especially dangerous or probable.

As for the email the commenter links to, I find it hard to see anything suspicious about it. Here's an excerpt:

The Soon & Baliunas paper couldn't have cleared a 'legitimate' peer review process anywhere. That leaves only one possibility--that the peer-review process at Climate Research has been hijacked by a few skeptics on the editorial board.

. . .

[In quoted reply:] I looked briefly at the paper last night and it is appalling - worst word I can think of today without the mood pepper appearing on the email ! . . . The phrasing of the questions at the start of the paper determine the answer they get. They have no idea what multiproxy averaging does. By their logic, I could argue 1998 wasn't the warmest year globally, because it wasn't the warmest everywhere. With their LIA being 1300-1900 and their MWP 800-1300, there appears (at my quick first reading) no discussion of synchroneity of the cool/warm periods. Even with the instrumental record, the early and late 20th century warming periods are only significant locally at between 10-20% of grid boxes.

I'll freely admit that I don't know what the jargon means, but "they have no idea what X does" is not a phrasing I'd ever have wanted to read in a review of a paper of mine. This is not prima facie evidence of anything but that some journal published a bad paper.

And subversion of publication venues by cranks is actually an everpresent danger in the sciences. Do climate change denialists really believe that concern over a journal's publishing trash is evidence of a groupthink conspiracy? No doubt they'll be taking up the heroic cause of M. S. El Naschie next.


[0] The notion of "trick" seems loosely related to what the C.S. community calls a "pearl", except that pearls are perhaps more rigidly described, and computer scientists publish them in some venues regardless (e.g. ICFP). Truthfully the "unpublishability" of tricks and other semi-formal knowledge seems like a flaw in scientific publishing, albeit one that blogs and other Internet-based venues may be correcting.

[1] It's not even unheard-of for laypeople to use the word "trick" in roughly this sense. See: Clemenza's recipe from The Godfather. I look forward to climate skeptics' exegesis of the diabolical agenda behind Clemenza's spaghetti sauce.

Sunday, November 22, 2009

Pro tip for ChromeOS punditry

Full disclosure: I work for Google, although this blog reflects my personal opinions only.

Please don't pontificate about ChromeOS until you grok the long-term implications of Web Storage, Native Client, Open3D/WebGL, Courgette, and a large local disk cache. Oh, and, of course, Moore's Law.

And if you can't work out the implications, at least talk to someone who can, before you hit the "Post" button. You might still think that ChromeOS is a bad idea, but at least you'll be critical in a more clueful way.

Tuesday, November 03, 2009

Software patents have tangible costs for innovation, and for you

I have a friend who's been working extremely hard on a small software startup for the past few years. He and his partner developed a genuinely innovative, original technology which solves a useful problem for end-users and probably has significant commercial value. The technology has been integrated into a website that is awesomely functional and and even fun to use. (I'd point you there, except that I'm going to discuss legal matters shortly and I think it's better not to identify the parties by name.)

His startup recently got sued for patent infringement by a company that independently developed a product that performs a vaguely similar function. This other company's product is much less sophisticated, and their user-facing site is an ugly, user-hostile pile of crap. The term "search arbitrage" would be a kind word to apply to this other company's product. And there is absolutely no sense in which my friend's work builds on any of this other company's technology.

Now, my friend and his partner have consulted multiple IP lawyers and they've said, "Yep, the law is probably on your side." They have also said, "You're still screwed." The trial would take forever, the legal fees would be ruinous, and in the meantime nobody will invest in a company which has a litigation cloud hanging over it.

So, this sucks for my friend and his partner. More importantly, this sucks for you, because, having seen the product, I am 100% convinced that you, or someone you know, would love to have this technology acquired and integrated into a major site that you use.

This sort of story is not at all uncommon in the software industry. I've been meaning to tell it for a couple of weeks now, because it made me think of this post by Tim Lee about "libertarian political philosopher" Richard Epstein's bold claim in an amicus brief* that:

The credible threat of a published patent’s right to exclude acts like a beacon in the dark, drawing to itself all those interested in the patented subject matter. This beacon effect motivates those diverse actors to interact with one another and with the patentee, starting conversations among the relevant parties.

In response, Tim writes:

There’s nothing beacon-like about software patents. Software companies do not use patents as a mechanism for finding technologies or business partners. Patents tend to be written in unintelligible legalese, they’re not well indexed, and they issue years after they’re filed. They’re completely irrelevant to the day-to-day process of product development in the software industry. I’ve never met a software developer who regards the patent database as a useful source of information about software inventions, nor can I think of an example of a software company (Intellectual Ventures doesn’t count) that uses patents as a central part of its product-development strategy.

Completely true, except that Tim does not go nearly far enough. At any software company with competent legal counsel, developers are instructed in the strongest possible terms never, ever to look at a patent, because the tiniest amount of documented influence could be used as ammunition in a lawsuit. The only time a sane software developer reads a patent is when your company's lawyers specifically ask you to help them prove you're not infringing on one. If you ever get wind that there's a patent even vaguely related to your work, you stick your fingers in your ears and run in the other direction. In short, software patents facilitate "conversation" about as well as poison gas bombs do.

One thing that I find extremely frustrating about many legal scholars' and economists' approach to patents is that they make two false assumptions. The first assumption is that transaction costs are acceptable, or can be made so with some modest reforms. The second assumption is that patent litigation is reasonably "precise"; i.e., if you don't infringe on something then you'll be able to build useful technology and bring it to market relatively unhindered. As my friend's story shows, both of these assumptions are laughably false. I mean, just black-is-white, up-is-down, slavery-is-freedom, we-have-always-been-at-war-with-Eastasia false.

The end result is that our patent system encourages "land grab" behavior which could practically serve as the dictionary definition of rent-seeking. The closest analogy is to a conquistador planting a flag on a random outcropping of rock at the tip of some peninsula, and then saying "I claim all this land for Spain", and then the entire Western hemisphere allegedly becomes the property of the Spanish crown. This is a theory of property that's light-years away from any Lockean notion of mixing your labor with the land or any Smithian notion of promoting economic efficiency. And yet it's the state of the law for software patents. Your business plan can literally be to build a half-assed implementation of some straightforward idea (or, in the case of Intellectual Ventures, don't build it at all), file a patent, and subsequently sue the pants off anybody who comes anywhere near the turf you've claimed. And if they do come near your turf, regardless of how much of their own sweat and blood they put into their independent invention, the legal system's going go off under them like a land mine.

It is hard to think of a more effective mechanism for discouraging innovation in software. I mean, I suppose you could plant a plastic explosive rigged to a random number generator under the seats of every software developer, and that would be slightly worse.


* To be fair, the amicus brief is not completely Epstein's work; it is the sworn work of one Dr. Ananda Chakrabarty and coauthored with lawyer F. Scott Kieff. I don't really know how these things work, but I assume that Epstein agrees with the argument laid out even if he's not the lone progenitor of it.