Wednesday, June 09, 2010

On the verbosity of Java generics and related type systems

So, recently T. B. Lee tweeted:

Java's generics syntax feels really clumsy. Are there other (strongly typed) languages that do it better?

I replied as best I could in 140 characters while riding a crowded Muni bus home after work, but I mangled the explanation, so I think I should rectify this.

First, it is worth comparing Java generics to other languages that have parametric polymorphism. For example, consider ML (and more generally the Hindley-Milner family of languages). In ML, you don't need to write down types most of the time, because of type inference. Whereas in Java you might write:

<K, V> LinkedList<K> keysAsList(HashMap<K, V> aMap) {
  LinkedList<K> result = new LinkedList<K>();
  for (K key : aMap.keySet()) {
    result.add(key);
  }
  return result;
}

in OCaml you can write:

let keysAsList aMap =
  Hashtbl.fold (fun key _ rest -> (k::rest)) aMap [];;

Notice that although we had to annotate all the Java variables with types, there's not a single type annotation in the OCaml code (Hashtbl.fold is the name of a function qualified by its module name; it is not a type annotation). But OCaml is statically typed nevertheless.

So, one might ask, what gives? Why can't you just add type inference to Java?

Well, the short answer is that typechecking Java generics is a fundamentally harder problem. ML has only parametric polymorphism; Java has both parametric polymorphism and subtype polymorphism (i.e. the object-oriented kind). It is perhaps not obvious why this makes things hard until you learn that B. C. Pierce proved in 1992 that bounded quantification in F (pronounced "F-sub") — a formalization of the most straightforward and general combination of subtyping and parametric polymorphism — is undecidable.

In other words, in F it is possible to write programs for which the type checker would not terminate. This is generally held to be a bad thing (n.b. I disagree with the prevailing opinion, but that's a discussion for another day), so over the next decade or so there followed several papers by Pierce and others attempting to isolate calculi weaker than F with type systems that were both usable and decidable. The most practically relevant outcome of this work was Featherweight Java, which provided the formal foundation for (most of) Java generics. C#, Scala, etc. build on this line of work, although in Scala's case fairly indirectly.

What does all this have to do with type inference? Well, nothing directly. But for any given level of type system expressiveness, full type inference is at least as hard as type checking. And the type checking problem for Java with generics already lives close to the undecidability ceiling (in fact, the decidability of Java generics with wildcards is, AFAIK, still an open problem; proving this sort of stuff used to be a hot research subject but I think everyone's gotten bored of object calculi and moved on). Oh, and I should mention that many of the best minds in academic language design have thrown themselves at the parametric polymorphism + subtyping + inference problem at one time or another, and come up empty. Now, none of this is hard proof that much richer type inference for object-oriented languages with generics is impossible, but it all hints strongly in this direction; at a minimum any such system is likely to be exceptionally intricate and difficult to prove sound.

So, basically, I believe that it is unlikely that anyone will come up with a fundamentally more concise type system for a programming style that combines (1) objects, (2) generics, and (3) static typing. At best, people will fiddle around on the margins — using different punctuation, for example, to denote type parameters, or making other parts of the language more terse to compensate.

That said, although general type inference seems hopeless, there are clearly some things that could make Java's generics more syntactically lightweight in certain common cases. For example, it would be trivial to infer the type of a variable at any declaration site with an initialization expression. C# and Scala appear to do some of this.