Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Type erasure in Haskell?

Tags:

types

haskell

I was reading a lecture note on Haskell when I came across this paragraph:

This “not caring” is what the “parametric” in parametric polymorphism means. All Haskell functions must be parametric in their type parameters; the functions must not care or make decisions based on the choices for these parameters. A function can't do one thing when a is Int and a different thing when a is Bool. Haskell simply provides no facility for writing such an operation. This property of a langauge is called parametricity.

There are many deep and profound consequences of parametricity. One consequence is something called type erasure. Because a running Haskell program can never make decisions based on type information, all the type information can be dropped during compilation. Despite how important types are when writing Haskell code, they are completely irrelevant when running Haskell code. This property gives Haskell a huge speed boost when compared to other languages, such as Python, that need to keep types around at runtime. (Type erasure is not the only thing that makes Haskell faster, but Haskell is sometimes clocked at 20x faster than Python.)

What I don't understand is how are "all Haskell functions" parametric? Aren't types explicit/static in Haskell? Also I don't really understand how type erasure improves compiling time runtime?

Sorry if these questions are really basic, I'm new to Haskell.

EDIT:

One more question: why does the author say that "Despite how important types are when writing Haskell code, they are completely irrelevant when running Haskell code"?

like image 403
Liumx31 Avatar asked Sep 28 '15 15:09

Liumx31


4 Answers

Others have already answered, but perhaps some examples can help.

Python, for instance, retains type information until runtime:

>>> def f(x):
...   if type(x)==type(0):
...     return (x+1,x)
...   else:
...     return (x,x)
... 
>>> f("hello")
('hello', 'hello')
>>> f(10)
(11, 10)

The function above, given any argument x returns the pair (x,x), except when x is of type int. The function tests for that type at runtime, and if x is found to be an int it behaves in a special way, returning (x+1, x) instead.

To realize the above, the Python runtime must keep track of types. That is, when we do

>>> x = 5

Python can not just store the byte representation of 5 in memory. It also needs to mark that representation with a type tag int, so that when we do type(x) the tag can be recovered.

Further, before doing any operation such as x+1 Python needs to check the type tag to ensure we are really working on ints. If x is for instance a string, Python will raise an exception.

Statically checked languages such as Java do not need such checks at runtime. For instance, when we run

SomeClass x = new SomeClass(42);
x.foo();

the compiler has already checked there's indeed a method foo for x at compile time, so there's no need to do that again. This can improve performance, in principle. (Actually, the JVM does some runtime checks at class load time, but let's ignore those for the sake of simplicity)

In spite of the above, Java has to store type tags like Python does, since it has a type(-) analogous:

if (x instanceof SomeClass) { ...

Hence, Java allows one to write functions which can behave "specially" on some types.

// this is a "generic" function, using a type parameter A
<A> A foo(A x) {
   if (x instanceof B) {  // B is some arbitrary class
      B b = (B) x;
      return (A) new B(b.get()+1);
   } else {
      return x;
   }
}

The above function foo() just returns its argument, except when it's of type B, for which a new object is created instead. This is a consequence of using instanceof, which requires every object to carry a tag at runtime.

To be honest, such a tag is already present to be able to implement virtual methods, so it does not cost anything more. Yet, the presence of instanceof makes it possible to cause the above non-uniform behaviour on types -- some types can be handled differently.

Haskell, instead has no such type/instanceof operator. A parametric Haskell function having type

foo :: a -> (a,a)

must behave in the same way at all types. There's no way to cause some "special" behaviour. Concretely, foo x must return (x,x), and we can see this just by looking at the type annotation above. To stress the point, there's no need to look at the code (!!) to prove such property. This is what parametricity ensures from the type above.

like image 171
chi Avatar answered Dec 19 '22 22:12

chi


Implementations of dynamically typed languages typically need to store type information with each value in memory. This isn't too bad for Lisp-like languages that have just a few types and can reasonably identify them with a few tag bits (although such limited types lead to other efficiency issues). It's much worse for a language with lots of types. Haskell lets you carry type information to runtime, but it forces you to be explicit about it, so you can pay attention to the cost. For example, adding the context Typeable a to a type offers a value with that type access, at runtime, to a representation of the type of a. More subtly, typeclass instance dictionaries are usually specialized away at compile time, but in sufficiently polymorphic or complex cases may survive to runtime. In a compiler like the apparently-abandoned JHC, and one likely possibility for the as-yet-barely-started compiler THC, this could lead to some type information leaking to runtime in the form of pointer tagging. But these situations are fairly easy to identify and only rarely cause serious performance problems.

like image 35
dfeuer Avatar answered Dec 19 '22 20:12

dfeuer


What I don't understand is how are "all Haskell functions" parametric?

It doesn't say all Haskell functions are parametric, it says:

All Haskell functions must be parametric in their type parameters.

A Haskell function need not have any type parameters.

One more question: why does the author say that "Despite how important types are when writing Haskell code, they are completely irrelevant when running Haskell code"?

Unlike a dynamically typed language where you need to check at run time if (for example) two things are numbers before trying to add them together, your running Haskell program knows that if you're trying to add them together, then they must be numbers because the compiler made sure of it beforehand.

Aren't types explicit/static in Haskell?

Types in Haskell can often be inferred, in which case they don't need to be explicit. But you're right that they're static, and that is actually why they don't matter at run time, because static means that the compiler makes sure everything has the type that it should before your program ever executes.

like image 22
gdejohn Avatar answered Dec 19 '22 20:12

gdejohn


Types can be erased in Haskell because the type of an expression is either know at compile time (like True) or its type does not matter at runtime (like []).

There's a caveat to this though, it assumes that all values have some kind of uniform representation. Most Haskell implementations use pointers for everything, so the actual type of what a pointer points to doesn't matter (except for the garbage collector), but you could imagine a Haskell implementation that uses a non-uniform representation and then some type information would have to be kept.

like image 33
augustss Avatar answered Dec 19 '22 21:12

augustss