Friday, February 13, 2009

Type Inference

I'm currently teaching CSC 212 - The Practice of Computer Science at UVic this term, and one of the topics we've touched on in the course is that of type inference -- the process by which a statically typed language can deduce the types of expressions without programmer annotation.

First some motivation from personal experience: there truly are moments in every programmers life when they learn about some new technique, concept, or abstraction and they stop, turn into Neo in the Matrix and go "woah". One of the first of them for me was when I learned about inheritance and polymorphism. Another was when I learned about type inference. Until you've used it you really can't appreciate how useful it is. It's yet another argument in favour of static typing, and it is really, really cool. It is without a doubt one of the coolest features I've seen in functional languages (though it is not limited to FP languages -- more on this in a second).

So what is it? Let's look at an example. In Java or C++ you might write a function with a header like:

Vector<somereallylongclassname> myMethodName (SomeOtherClassName foo)

And I don't know about you, but typing "SomeReallyLongClassName" and "SomeOtherClassName" is a pain in the ass. C/C++ alleviates this a bit with typedefs, but you still have to type extra characters to specify types of arguments (and Java doesn't have the typedef keyword). Wouldn't it be nice if the compiler could figure this out for us and save us some keystrokes? And of course it would be, and that's where type inference comes in.

In a language which supports type inference, the types of identifiers in your source code are deduced by the compiler depending upon the context in which the identifier is used. The trivial example I usually show students is the following function declaration in SML (a functional language designed by Robin Milner who is oftentimes credited as the guy who came up with the notion of type inference):

fun foo x = x + 3;

When the above code is compiled, the compiler deduces the type signature of the function foo() to be "int -> int", or (in English) a function which takes an integer and returns an integer. Note however, there is no syntactic evidence of type annotations -- nowhere did I write "int x" or something similar. The compiler figures out the type of x by the "x + 3" expression in foo()'s definition. Since 3 is an int, and the + operator is one which takes two arguments of the same type it infers that x must therefore be an int (and additionally since the + operator returns a value of the same type as its operands it also deduces that foo must return an int as well).

This is magic.

It's also the best of both worlds: all the type safety of static typing, and the "clean" or "uncluttered" code of dynamic typing.

One thing I've been thinking about lately though is how type inference would play out in a OO language. One of the reasons the algorithm works well in a language like ML is because functions can neither be overloaded or overrided (though they can be rebound or redefined). So for example, if I have a function declaration like:

fun foo x = bar (x);

There's no trouble in figuring out what which function bar() refers to (since there's only one). In an OO language like Java you could have a situation like:

Foo f = new SubclassofFoo();
f.someMethod(x, y, z);

and suddenly the problem becomes more complex -- is someMethod() a part of the Foo class or SubclassOfFoo? Or is it defined in some other parent class to Foo? And since we can overload methods in Java we also need to know the types of x, y, and z to figure out which version of someMethod we need.

Now imagine we lived in the C++ world and we bring in multiple inheritance and suddenly the world gets even more complicated as now someMethod() could be in one of many parent classes.

I suppose this is about the time I put on my snooty elitist academic hat and say that this is another example of why we should all be coding in functional languages, but I'll save that for another blog entry. :)

At any rate, type inference is a staple feature in many functional languages: SML, Haskell and CAL all use it. The Gem Cutter visual programming environment I'm using in my thesis work is a great tool as well for learning about type inference -- as you make or break connections between functions, type inference is applied and the types of inputs and outputs change accordingly. This allows newer programmers to more visually see the cause and effect nature of the algorithm.