When choosing between Data.Map.Lazy
and Data.Map.Strict
, the docs tell us for the former:
API of this module is strict in the keys, but lazy in the values. If you need value-strict maps, use
Data.Map.Strict
instead.
and for the latter likewise:
API of this module is strict in both the keys and the values. If you need value-lazy maps, use
Data.Map.Lazy
instead.
How do more seasoned Haskellers than me tend to intuit this "need"? Use-case in point, in a run-and-done (ie. not daemon-like/long-running) command-line tool: readFile
ing a simple lines
-based custom config file where many (not all) lines define key:value pairs to be collected into a Map
. Once done, we rewrite many values in it depending on other values in it that were read later (thanks to immutability, in this process we create a new Map
and discard the initial incarnation).
(Although in practice this file likely won't often or ever reach even a 1000 lines, let's just assume for the sake of learning that for some users it will before long.)
Any given run of the tool will perhaps lookup some 20-100% of the (rewritten on load, although with lazy-eval I'm never quite sure "when really") key:value pairs, anywhere between once and dozens of times.
How do I reason about the differences between "value-strict" and "value-lazy" Data.Map
s here? What happens "under the hood", in terms of mainstream computing if you will?
Fundamentally, such hash-maps are of course about "storing once, looking up many times" --- but then, what in computing isn't, "fundamentally". And furthermore the whole concept of lazy-eval's thunks seems to boil down to this very principle, so why not always stay value-lazy?
How do I reason about the differences between "value-strict" and "value-lazy" Data.Maps here?
Value lazy is the normal in Haskell. This means that not just values, but thunks (i.e. recipes of how to compute the value) are stored. For example, lets say you extract the value from a line like this:
tail (dropUntil (==':') line)
Then a value-strict map would actually extract the value upon insert, while a lazy one would happily just remember how to get it. This is then also what you would get on a lookup
Here are some pros and cons:
line
).As always, there are no fixed measures like: "If your evaluated value needs less than 20 bytes and takes less than 30µs to compute, use strict, else use lazy."
Normally, you just go with one and when you notice extreme runtimes/memory usage you try the other.
Here's a small experiment that shows a difference betwen Data.Map.Lazy
and Data.Map.Strict
. This code exhausts the heap:
import Data.Foldable
import qualified Data.Map.Lazy as M
main :: IO ()
main = print $ foldl' (\kv i -> M.adjust (+i) 'a' kv)
(M.fromList [('a',0)])
(cycle [0])
(Better to compile with a small maximum heap, like ghc Main.hs -with-rtsopts="-M20m"
.)
The foldl'
keeps the map in WHNF as we iterate over the infinite list of zeros. However, thunks accumulate in the modified value until the heap is exhausted.
The same code with Data.Map.Strict
simply loops forever. In the strict variant, the values are in WHNF whenever the map is in WHNF.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With