Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

when should I use as-patterns to identify common sub-expressions?

I'm wondering to what extent do I have to worry about micro-optimizing common sub-expressions in Haskell (using GHC) especially with regard to data destructuring.

For example, consider this code for merging two ordered lists:

merge :: Ord a => [a] -> [a] -> [a]
merge [] bs = bs
merge as [] = as
merge (a:as) (b:bs) =
  case compare a b of
    LT -> a : merge as (b:bs)  -- (1)
    EQ -> a : merge as bs
    GT -> b : merge (a:as) bs  -- (2)

Will GHC recognize that the (a:as) at (2) and (b:bs) at (1) are the same as the input parameters? Or, should I use "as" patterns, e.g.:

merge a'@(a:as) b'@(b:bs) =
  ...
  LT -> a : merge as b'
  ...
  GT -> b : merge a' bs

And in general when can GHC recognize common sub-expressions in in data constructions and when do I need to help it?

like image 325
ErikR Avatar asked Jul 22 '14 13:07

ErikR


People also ask

What is haskell pattern matching?

Overview. We use pattern matching in Haskell to simplify our codes by identifying specific types of expression. We can also use if-else as an alternative to pattern matching. Pattern matching can also be seen as a kind of dynamic polymorphism where, based on the parameter list, different methods can be executed.

What is a guard in Haskell?

Haskell guards are used to test the properties of an expression; it might look like an if-else statement from a beginner's view, but they function very differently. Haskell guards can be simpler and easier to read than pattern matching .

What does Just do in Haskell?

It represents "computations that could fail to return a value". Just like with the fmap example, this lets you do a whole bunch of computations without having to explicitly check for errors after each step.


Video Answer


1 Answers

It depends. If you compile without optimizations, there is definitely a penalty to not using the as-patterns, but if you compile the following code

module Test where


merge1 :: Ord a => [a] -> [a] -> [a]
merge1 [] bs = bs
merge1 as [] = as
merge1 (a:as) (b:bs) = case compare a b of
    LT -> a : merge1 as (b:bs)
    EQ -> a : merge1 as bs
    GT -> b : merge1 (a:as) bs


merge2 :: Ord a => [a] -> [a] -> [a]
merge2 [] bs = bs
merge2 as [] = as
merge2 xs@(a:as) ys@(b:bs) = case compare a b of
    LT -> a : merge2 as ys
    EQ -> a : merge2 as bs
    GT -> b : merge2 xs bs

with -O2 and -ddump-simpl, and after a lot of judicious editing, renaming of variables, stripping of metadata and annotations, and various other tricks you can arrive at something like

merge2_2 :: Ord a => a -> [a] -> [a] -> [a]
merge2_2 = \ ordInst x y z -> case z of _ {
      [] -> (x:y);
      (w:v) -> case compare ordInst x w of _ {
          LT -> x : merge2_1 ordInst y w v;
          EQ -> x : merge2   ordInst y   v;
          GT -> w : merge2_2 ordInst x y v
        }
    }

merge2_1 :: Ord a => [a] -> a -> [a] -> [a]
merge2_1 = \ ordInst x y z -> case x of _ {
      [] -> (y:z);
      (w:v) -> case compare ordInst w y of _ {
          LT -> w : merge2_1 ordInst v y z;
          EQ -> w : merge2   ordInst v   z;
          GT -> y : merge2_2 ordInst w v z
        }
    }

merge2 :: Ord a => [a] -> [a] -> [a]
merge2 = \ ordInst x y -> case x of wild {
      [] -> y;
      (w:v) -> case y of _ {
          [] -> wild;
          (z:zs) -> case compare ordInst w z of _ {
              LT -> w : merge2_1 ordInst v z zs;
              EQ -> w : merge2   ordInst v   zs;
              GT -> z : merge2_2 ordInst w v zs
            }
        }
    }


merge1_2 :: Ord a => a -> [a] -> [a] -> [a]
merge1_2 = \ ordInst x y z -> case z of _ {
      [] -> (x:y);
      (w:v) -> case compare ordInst x w of _ {
          LT -> x : merge1_1 ordInst y w v;
          EQ -> x : merge1   ordInst y   v;
          GT -> w : merge1_2 ordInst x y v
        }
    }

merge1_1 :: Ord a => [a] -> a -> [a] -> [a]
merge1_1 = \ ordInst x y z -> case x of _ {
      [] -> (y:z);
      (w:v) -> case compare ordInst w y of _ {
          LT -> w : merge1_1 ordInst v y z;
          EQ -> w : merge1   ordInst v   z;
          GT -> y : merge1_2 ordInst w v z
        }
    }

merge1 :: Ord a => [a] -> [a] -> [a]
merge1 = \ ordInst x y -> case x of wild {
      [] -> y;
      (w:v) -> case y of _ {
          [] -> wild;
          (z:zs) -> case compare ordInst w z of _ {
              LT -> w : merge1_1 ordInst v z zs;
              EQ -> w : merge1   ordInst v   zs;
              GT -> z : merge1_2 ordInst w v zs
            }
        }
    }

Which, after comparison with a diffing tool, tells me that the only differences between these two definitions is the numbers at the ends of the merge function names. So yes, after applying optimizations GHC can figure out these as-patterns for you, at least in the case of lists. However, I would still recommend using them because there are always edge cases, and if you do use them then you don't have to trust the compiler to do something complicated. It also means that even if optimizations aren't enabled, your code will still have that optimization in there. It may seem like a tiny change, but if this function ends up in an inner loop or the structures you're working on are very large it could have a significant performance impact. Additionally, I find that for constructors with a large number of fields it's often just more convenient to have a name to refer to "constructed" form.

Here's the full core dump if you're interested. Basically, I started by joining lines together to take up less room, then removed unnecessary noise in the type signatures, removed qualified module names, all of the [Something] annotations, renamed things like merge2_$smerge2 to merge2_2, removed all local type signatures, then started renaming using Sublime Text's wonderful multiple cursors feature. I started with the type names, renaming them all to just a, then renamed variable names to x, y, z, w, v, and zs (I'm creative, aren't I?), made all applications of the : operator infix, removed the Rec regions, and a few more bits of styling later I got the output you see above.

like image 156
bheklilr Avatar answered Dec 26 '22 13:12

bheklilr