I have been studying Haskell in my spare time for a couple of months now. I'm wondering how Haskell performs on the current stock hardware, in regards to the memory sub-system (L1, L2, L3 cache). Can someone please point me to any report/study on how cache friendly is Haskell because of its Lazy evaluation/call-by-need? Is there a way for us to get the info as to how many data cache misses and instruction cache misses happened and see if this is due to the lazy evaluation nature of the language?
Thanks.
Lazy evaluation is a method to evaluate a Haskell program. It means that expressions are not evaluated when they are bound to variables, but their evaluation is deferred until their results are needed by other computations.
Haskell is a lazy language, meaning that it employs lazy evaluation . Before explaining lazy evaluation , let's first explain strict evaluation which most readers will likely be more familiar with. Strict evaluation means that arguments to functions are evaluated prior to being passed to the functions.
Lazy evaluation's is not always better. The performance benefits of lazy evaluation can be great, but it is not hard to avoid most unnecessary evaluation in eager environments- surely lazy makes it easy and complete, but rarely is unnecessary evaluation in code a major problem.
The benefits of lazy evaluation include: The ability to define control flow (structures) as abstractions instead of primitives. The ability to define potentially infinite data structures. This allows for more straightforward implementation of some algorithms.
This is primary a question of runtime value representation.
Most types in Haskell are "lifted", i.e., they may contain bottom
. In reality, this means that even Int
is represented by a pointer, either to a machine Int or a computation (which could diverge, i.e.,bbottom
).
Here is a quick primer on boxed vs. lifted (vs. primitive).
I.e., Array#
is not lifted, as it may not be bottom, but it is boxed, as it may contain lifted values.
So what does that have to do with non-strict evaluation? A non-evaluated computation is called a "thunk". Having a linked list of pointers to Ints is way worse for cache locality then having a compact array of machine integers and so is chasing pointers to thunks.
That is why there has been much research and engineering on demand analysis - if a the value of a computation will be needed anyway, it can be made strict.
AFAIK, "boxity" analysis is part of demand analysis. Anyway, GHC will try to get rid of pointers where possible.
But unfortunately I don't have any empirical data or studies on this.
EDIT: Some more reading on strictness/boxity analysis:
GHC -fstrictness
Demand analyzer in GHC
SO Answer by Richard Eisenberg, giving some more clarification on "levity" (lifted/unlifted), bottom and undefined
.
Edit2:
Found a paper from 2002: Nethercote, Mycroft; The cache behaviour of large lazy functional programs on stock hardware
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With