Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell: `Map (a,b) c` versus `Map a (Map b c)`?

Thinking of maps as representations of finite functions, a map of two or more variables can be given either in curried or uncurried form; that is, the types Map (a,b) c and Map a (Map b c) are isomorphic, or something close to it.

What practical considerations are there — efficiency, etc — for choosing between the two representations?

like image 412
PLL Avatar asked May 17 '13 15:05

PLL


2 Answers

The Ord instance of tuples uses lexicographic order, so Map (a, b) c is going to sort by a first anyway, so the overall order will be the same. Regarding practical considerations:

  • Because Data.Map is a binary search tree splitting at a key is comparable to a lookup, so getting a submap for a given a in the uncurried form won't be significantly more expensive than in the curried form.

  • The curried form may produce a less balanced tree overall, for the obvious reason of having multiple trees instead of just one.

  • The curried form will have a bit of extra overhead to store the nested maps.

  • The nested maps of the curried form representing "partial applications" can be shared if some a values produce the same result.

  • Similarly, "partial application" of the curried form gives you the existing inner map, while the uncurried form must construct a new map.

So the uncurried form is clearly better in general, but the curried form may be better if you expect to do "partial application" often and would benefit from sharing of Map b c values.

Note that some care will be necessary to ensure you actually benefit from that potential sharing; you'll need to explicitly define any shared inner maps and reuse the single value when constructing the full map.

Edit: Tikhon Jelvis points out in the comments that the memory overhead of the tuple constructors--which I did not think to account for--is not at all negligible. There is certainly some overhead to the curried form, but that overhead is proportional to how many distinct a values there are. The tuple constructor overhead in the uncurried form, on the other hand, is proportional to the total number of keys.

So if, on average, for any given value of a there are three or more distinct keys using it you'll probably save memory using the curried version. The concerns about unbalanced trees still apply, of course. The more I think about it, the more I suspect the curried form is unequivocally better except perhaps if your keys are very sparse and unevenly distributed.


Note that because arity of definitions does matter to GHC, the same care is required when defining functions if you want subexpressions to be shared; this is one reason you sometimes see functions defined in a style like this:

foo x = go
  where z = expensiveComputation x
        go y = doStuff y z
like image 163
C. A. McCann Avatar answered Sep 29 '22 04:09

C. A. McCann


Tuples are lazy in both elements, so the tuple version introduces a little extra laziness. Whether this is good or bad strongly depends on your usage. (In particular, comparisons may force the tuple elements, but only if there are lots of duplicate a values.)

Beyond that, I think it's going to depend on how many duplicates you have. If a is almost always different whenever b is, you're going to have a lot of small trees, so the tuple version might be better. On the other hand, if the opposite is true, the non-tuple version may save you a little time (not constantly recomparing a once you've found the appropriate subtree and you're looking for b).

I'm reminded of tries, and how they store common prefixes once. The non-tuple version seems to be a bit like that. A trie can be more efficient than a BST if there's lots of common prefixes, and less efficient if there aren't.

But the bottom line: benchmark it!! ;-)

like image 42
MathematicalOrchid Avatar answered Sep 29 '22 05:09

MathematicalOrchid