Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the distinction between foldr and foldl important for maps and sets?

(This question applies more generally than to Haskell, but that's the language I'll use to state it.)

The distinction between foldl and foldr seems to depend on the fact that lists are ordered. That is, foldl and foldl collapse a list by applying a function to a starting value and the first or last element, then to the result of the first application and the second or second-to-last element, then the result of the second application to the third or third-to-last element, etc.

But the Data.Set and Data.Map libraries for Haskell define their own versions of foldl and foldr. For instance, for maps they are these:

foldr :: (a -> b -> b) -> b -> Map k a -> b
foldl :: (b -> a -> b) -> b -> Map k a -> b -- I've swapped `a` and `b`
  -- in the second type signature to clarify the correspondence.

Maps and sets aren't ordered. Should I expect performance differences between the versions of foldl and foldr defined for sets and maps, or does foldr f do exactly the same thing as foldl (flip f)?

like image 652
Jeffrey Benjamin Brown Avatar asked Dec 07 '18 23:12

Jeffrey Benjamin Brown


2 Answers

Actually, sets and maps are ordered. That's why all the set and map functions have the Ord constraint on the key type. They're automatically ordered by the natural ordering of the element type. So if your set contains {3, 5, 2, 1, 4}, then Haskell will see it (for folding purposes) in the order {1, 2, 3, 4, 5}.

But let's forget about that for a minute. Let's assume we're in a perfect world where the data truly is unordered. Even then, the difference between foldl and foldr is significant. Suppose I have the set as before: {3, 5, 2, 1, 4}, and I want to perform some operation .* on it.

foldl (.*) 0 mySet = ((((0 .* 3) .* 5) .* 2) .* 1) .* 4
foldr (.*) 0 mySet = 3 .* (5 .* (2 .* (1 .* (4 .* 0))))

So even if the operation happens to be associative, the initial element is placed on the opposite side with a foldl versus a foldr. In fact, some operations won't even work if implemented using the wrong fold. Consider toList, which is defined in Data.Foldable to work on any Foldable object (including lists, maps, and sets). One possible implementation is

toList :: Foldable t => t a -> [a]
toList = foldr (:) []

If we were to try doing a foldl

definitelyWrong :: Foldable t => t a -> [a]
definitelyWrong = foldl (:) []

Then it doesn't even compile.

wrongfold.hs:5:25: error:
    • Occurs check: cannot construct the infinite type: a ~ [a]
      Expected type: [a] -> [a] -> [a]
        Actual type: a -> [a] -> [a]

This is because the folds associate differently and can work even with accumulation operations which take two differently-typed arguments, which is also apparent from the type signatures of the two

foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b
foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b

Note how the first argument is not a -> a -> a but actually has two different types, which indicates that the ordering definitely does matter.

like image 174
Silvio Mayolo Avatar answered Nov 15 '22 08:11

Silvio Mayolo


Maps and sets aren't ordered. Should I expect performance differences between the versions of foldl and foldr defined for sets and maps

If you refers the source code of Data.Set or Data.Map, you will find that their elements are organized in binary tree:

data Map k a  = Bin !Size !k a !(Map k a) !(Map k a)
              | Tip 

data Set a    = Bin !Size !a !(Set a) !(Set a)
              | Tip

and the foldr of Set:

foldr f z = go z
  where
    go z' Tip           = z'
    go z' (Bin _ x l r) = go (f x (go z' r)) l

traverse the tree using depth-first search with order right, current, left, so when foldr (+) 0 apply to follow tree:

                   1
                  / \
                 4   2
                      \
                       3

gives,

4 + (1 + (2 + (3 + 0)))

and foldl

foldl f z = go z
  where
    go z' Tip           = z'
    go z' (Bin _ x l r) = go (f (go z' l) x) r

with order left, current, right when apply foldl (+) 0 to above tree, give:

((((0 + 4) + 1) + 2) + 3)

It show that foldr and foldl of Set equivalent to that apply to the list as:

foldr (+) 0 [4, 1, 2, 3] = 4 + (1 + (2 + (3 + 0)))
foldl (+) 0 [4, 1, 2, 3] = ((((0 + 4) + 1) + 2) + 3)

similar situation as Data.Map and don't repeat here.

Moreover, as we knew, foldr can apply to infinite list (but foldl cannot), for example:

take 10 $ foldr ((:) . sum) [] $ chunksOf 3 [1..] = [6,15,24,33,42,51,60,69,78,87]

(here chunksOf group the list like [[1,2,3], [4,5,6]...])

But how about when the tree has a path is infinite like:

                   1
                  / \
                 4   2
                      \
                       3
                        \
                         ...  <- infinite path

Does the foldr of Set behave as list as mentioned above? (I guess the answer is yes, you can check it for youself)

does foldr f do exactly the same thing as foldl (flip f)?

No, As the source code as shown above:

foldr          = ... go (f x (go z' r)) l 

and

foldl (flip f) = ... go (f x (go z' l)) r

The traversing order of tree is different, but the generic relation between foldr and foldl can be found in this post: Defining foldl in terms of foldr

like image 33
assembly.jc Avatar answered Nov 15 '22 07:11

assembly.jc