I have found that in containers package crucial data structures such as Data.Map or Data.IntMap are implemented in pure Haskell.
Question: I wonder, wouldn't it be more efficient to implement them in C? I know ghc is very good but definitely cannot compete with optimized C code.
This is an interesting question, but I'm not sure that anyone is going to be able to offer any conclusive answer one way or the other (aside from designing and building an extensive test suite and generating a huge pile of numbers).
The assumption behind this question appears to be "C is always fast, Haskell is always slow, so wouldn't it be better to write the code in C rather than Haskell?" I'm not sure this assumption is factually accurate. (In my limited experience, it's not that Haskell is slow, it's that Haskell can be slow - or very fast - and it's awkward to predict what kind of speed you're going to get.)
Calling C via the FFI has an overhead. Haskell data structures are handled by the garbage collector; memory used via C has to be manually managed. You're looking at quite a bit of effort here, possibly for not as much benefit as you might think.
C data structures tend to be more efficient because of mutability. Most people don't want to work with mutable data structures in Haskell. (In some sense, it negates all the benefits of using Haskell in the first place, so why bother?) If you use immutable data structures in C, you might even find it goes slower than Haskell. (E.g., I'm told C is quite slow at dynamic memory allocation, which would be a problem for persistent data structures. The alternative is to copy stuff all over the place, which also isn't going to be fast.)
On top of that, GHC does clever optimisations like deforestation, which can sometimes result in containers completely disappearing at runtime. The C compiler could never do such a thing. And Haskell, being a lazy language, can sometimes completely skip work that you've asked it to do; that wouldn't work if containers were implemented in C.
In summary, it "looks like" implementing this stuff in C should "obviously" be far faster. In reality, I suspect the answer isn't nearly so clear-cut.
GHC's runtime is optimized for allocating immutable structures efficiently. It typically will beat the C runtime (malloc) at this task. As a result, C is mostly used to optimize algorithms, not data structures. An exception is either very low level data structures, or highly tuned mutable structures.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With