Modern day CPUs are optimised so that access and modification of the same place in memory (temporal locality), as well as consecutive places in memory (spatial locality) are extremely fast operations.
Now, since Haskell is a purely immutable language, you naturally can't overwrite existing memory blocks, potentially making things like foldl
much slower than a for
loop with a continuously accessed result variable would be in C.
Does Haskell do anything internally to mitigate this performance loss? And in general, what are its properties regarding locality?
The general rule is that for "vanilla" Haskell programming you get very little (if any) control over memory layout and memory locality.
However, there do exist a number of more advanced features that allow such control, and libraries that expose friendly abstractions on top of these. The vector
library is probably the most popular of the latter. This library provides several fixed-size array types, two of which (Data.Vector.Unboxed
and Data.Vector.Storable
) give you data locality by representing vectors and their contents as contiguous memory arrays. Data.Vector.Unboxed
even contains a simple automatic "structure of arrays" transformation—an unboxed vector of pairs will be represented as a pair of unboxed vectors, one for each of the pairs' components.
Another example is the JuicyPixels
library for image processing, which represents images in memory as contiguous bitmaps. This actually bottoms out to Data.Vector.Storable
, which exploits a standard facility (Foreign.Storable
) for translating user-defined Haskell data types to and from raw bytes.
But the general pattern is this: in Haskell, when you're interested in memory locality, you identify which data needs to benefit from it and bundle it together in a custom data type whose implementation was designed to provide locality and performance guarantees. Writing such a data type is an advanced undertaking, but most of the legwork has been done already in a reusable fashion (note for example that JuicyPixels
mostly just reuses vector
).
Note also that:
vector
provides stream fusion optimizations to eliminate intermediate arrays when you apply nested vector transformations. If you generate a vector from 0 to 1,000,000, filter out the even numbers, map the (^2)
function over that and sum the elements of the result, no array is ever allocated—the library has the smarts to rewrite that to an accumulator loop from 0 to 1,000,000. So a foldl
of a vector isn't necessarily slower than a for
loop—there might be no array at all!vector
provides mutable arrays as well. More generally, in Haskell you can overwrite existing memory if you really insist. It's just (a) not the default paradigm in the language, and therefore (b) a little bit clunky, but absolutely tractable if you just need it in a few performance-sensitive spots.So most of the time, the answer to "I want memory locality" is "use vector
."
Haskell is an extremely high-level language, and you're asking a question about an extremely low-level detail.
Overall, Haskell's performance is probably similar to any garbage-collected language like Java or C#. In particular, Haskell has mutable arrays, which will have performance similar to any other array. (You may need unboxed arrays to match C performance.)
For something like a fold, if the final result is something like a machine integer, that probably ends up in a processor register for the entire duration of the loop. So the final machine code is pretty much identical to “a continuously-accessed variable in C”. (If the result is a dictionary or something, then probably not. But that's the same as C as well.)
More generally, if locallity is something that matters to you, any garbage-collected language probably isn't your friend. But, again, you can use unboxed arrays to work around that.
All of this talks is great and all, but if you really want to know how fast a specific Haskell program is, benchmark it. It turns out well-written Haskell programs are usually quite fast. (Just like most compiled languages.)
Added: You can ask GHC to output partially-compiled code in Core format, which is lower-level than Haskell but higher-level than machine code. This lets you see what the compiler has decided to do (in particular, where stuff has been inlined, where abstractions have been removed, etc.) This can help you find out what the final code looks like, without having to go all the way down to machine code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With