Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data.Map: how do I tell if I "need value-strict maps"?

Tags:

haskell

When choosing between Data.Map.Lazy and Data.Map.Strict, the docs tell us for the former:

API of this module is strict in the keys, but lazy in the values. If you need value-strict maps, use Data.Map.Strict instead.

and for the latter likewise:

API of this module is strict in both the keys and the values. If you need value-lazy maps, use Data.Map.Lazy instead.

How do more seasoned Haskellers than me tend to intuit this "need"? Use-case in point, in a run-and-done (ie. not daemon-like/long-running) command-line tool: readFileing a simple lines-based custom config file where many (not all) lines define key:value pairs to be collected into a Map. Once done, we rewrite many values in it depending on other values in it that were read later (thanks to immutability, in this process we create a new Map and discard the initial incarnation).

(Although in practice this file likely won't often or ever reach even a 1000 lines, let's just assume for the sake of learning that for some users it will before long.)

Any given run of the tool will perhaps lookup some 20-100% of the (rewritten on load, although with lazy-eval I'm never quite sure "when really") key:value pairs, anywhere between once and dozens of times.

How do I reason about the differences between "value-strict" and "value-lazy" Data.Maps here? What happens "under the hood", in terms of mainstream computing if you will?

Fundamentally, such hash-maps are of course about "storing once, looking up many times" --- but then, what in computing isn't, "fundamentally". And furthermore the whole concept of lazy-eval's thunks seems to boil down to this very principle, so why not always stay value-lazy?

like image 448
metaleap Avatar asked Dec 23 '16 10:12

metaleap


2 Answers

How do I reason about the differences between "value-strict" and "value-lazy" Data.Maps here?

Value lazy is the normal in Haskell. This means that not just values, but thunks (i.e. recipes of how to compute the value) are stored. For example, lets say you extract the value from a line like this:

tail (dropUntil (==':') line)

Then a value-strict map would actually extract the value upon insert, while a lazy one would happily just remember how to get it. This is then also what you would get on a lookup

Here are some pros and cons:

  • lazy values may need more memory, not only for the thunk itself, but also for the data that are referenced there (here line).
  • strict values may need more memory. In our case this could be so when the string gets interpreted to yield some memory hungry structure like lists, JSON or XML.
  • using lazy values may need less CPU if your code doesn't need every value.
  • too deep nesting of thunks may cause stack-overflows when the value is finally needed.
  • there is also a semantic difference: in lazy mode, you may get away when the code to extract the value would fail (like the above one that fails if there isnt a ':' on the line) if you just need to look whether the key is present. In strict mode, your program crashes upon insert.

As always, there are no fixed measures like: "If your evaluated value needs less than 20 bytes and takes less than 30µs to compute, use strict, else use lazy."

Normally, you just go with one and when you notice extreme runtimes/memory usage you try the other.

like image 149
Ingo Avatar answered Oct 10 '22 01:10

Ingo


Here's a small experiment that shows a difference betwen Data.Map.Lazy and Data.Map.Strict. This code exhausts the heap:

import Data.Foldable
import qualified Data.Map.Lazy as M

main :: IO ()
main = print $ foldl' (\kv i -> M.adjust (+i) 'a' kv) 
                      (M.fromList [('a',0)])
                      (cycle [0])

(Better to compile with a small maximum heap, like ghc Main.hs -with-rtsopts="-M20m".)

The foldl' keeps the map in WHNF as we iterate over the infinite list of zeros. However, thunks accumulate in the modified value until the heap is exhausted.

The same code with Data.Map.Strict simply loops forever. In the strict variant, the values are in WHNF whenever the map is in WHNF.

like image 39
danidiaz Avatar answered Oct 10 '22 01:10

danidiaz