I'm wondering to what extent do I have to worry about micro-optimizing common sub-expressions in Haskell (using GHC) especially with regard to data destructuring. For example, consider this code for merging two ordered lists: <pre class="prettyprint"><code>merge :: Ord a => [a] -> [a] -> [a] merge [] bs = bs merge as [] = as merge (a:as) (b:bs) = case compare a b of LT -> a : merge as (b:bs) -- (1) EQ -> a : merge as bs GT -> b : merge (a:as) bs -- (2) </code></pre> Will GHC recognize that the <code>(a:as)</code> at (2) and <code>(b:bs)</code> at (1) are the same as the input parameters? Or, should I use "as" patterns, e.g.: <pre class="prettyprint"><code>merge a'@(a:as) b'@(b:bs) = ... LT -> a : merge as b' ... GT -> b : merge a' bs </code></pre> And in general when can GHC recognize common sub-expressions in in data constructions and when do I need to help it?

It depends. If you compile without optimizations, there is definitely a penalty to not using the as-patterns, but if you compile the following code <pre class="prettyprint"><code>module Test where merge1 :: Ord a => [a] -> [a] -> [a] merge1 [] bs = bs merge1 as [] = as merge1 (a:as) (b:bs) = case compare a b of LT -> a : merge1 as (b:bs) EQ -> a : merge1 as bs GT -> b : merge1 (a:as) bs merge2 :: Ord a => [a] -> [a] -> [a] merge2 [] bs = bs merge2 as [] = as merge2 xs@(a:as) ys@(b:bs) = case compare a b of LT -> a : merge2 as ys EQ -> a : merge2 as bs GT -> b : merge2 xs bs </code></pre> with <code>-O2</code> and <code>-ddump-simpl</code>, and after a lot of judicious editing, renaming of variables, stripping of metadata and annotations, and various other tricks you can arrive at something like <pre class="prettyprint"><code>merge2_2 :: Ord a => a -> [a] -> [a] -> [a] merge2_2 = \ ordInst x y z -> case z of _ { [] -> (x:y); (w:v) -> case compare ordInst x w of _ { LT -> x : merge2_1 ordInst y w v; EQ -> x : merge2 ordInst y v; GT -> w : merge2_2 ordInst x y v } } merge2_1 :: Ord a => [a] -> a -> [a] -> [a] merge2_1 = \ ordInst x y z -> case x of _ { [] -> (y:z); (w:v) -> case compare ordInst w y of _ { LT -> w : merge2_1 ordInst v y z; EQ -> w : merge2 ordInst v z; GT -> y : merge2_2 ordInst w v z } } merge2 :: Ord a => [a] -> [a] -> [a] merge2 = \ ordInst x y -> case x of wild { [] -> y; (w:v) -> case y of _ { [] -> wild; (z:zs) -> case compare ordInst w z of _ { LT -> w : merge2_1 ordInst v z zs; EQ -> w : merge2 ordInst v zs; GT -> z : merge2_2 ordInst w v zs } } } merge1_2 :: Ord a => a -> [a] -> [a] -> [a] merge1_2 = \ ordInst x y z -> case z of _ { [] -> (x:y); (w:v) -> case compare ordInst x w of _ { LT -> x : merge1_1 ordInst y w v; EQ -> x : merge1 ordInst y v; GT -> w : merge1_2 ordInst x y v } } merge1_1 :: Ord a => [a] -> a -> [a] -> [a] merge1_1 = \ ordInst x y z -> case x of _ { [] -> (y:z); (w:v) -> case compare ordInst w y of _ { LT -> w : merge1_1 ordInst v y z; EQ -> w : merge1 ordInst v z; GT -> y : merge1_2 ordInst w v z } } merge1 :: Ord a => [a] -> [a] -> [a] merge1 = \ ordInst x y -> case x of wild { [] -> y; (w:v) -> case y of _ { [] -> wild; (z:zs) -> case compare ordInst w z of _ { LT -> w : merge1_1 ordInst v z zs; EQ -> w : merge1 ordInst v zs; GT -> z : merge1_2 ordInst w v zs } } } </code></pre> Which, after comparison with a diffing tool, tells me that the only differences between these two definitions is the numbers at the ends of the <code>merge</code> function names. So yes, after applying optimizations GHC can figure out these as-patterns for you, at least in the case of lists. However, I would still recommend using them because there are always edge cases, and if you do use them then you don't have to trust the compiler to do something complicated. It also means that even if optimizations aren't enabled, your code will still have that optimization in there. It may seem like a tiny change, but if this function ends up in an inner loop or the structures you're working on are very large it could have a significant performance impact. Additionally, I find that for constructors with a large number of fields it's often just more convenient to have a name to refer to "constructed" form. Here's the full core dump if you're interested. Basically, I started by joining lines together to take up less room, then removed unnecessary noise in the type signatures, removed qualified module names, all of the <code>[Something]</code> annotations, renamed things like <code>merge2_$smerge2</code> to <code>merge2_2</code>, removed all local type signatures, then started renaming using Sublime Text's wonderful multiple cursors feature. I started with the type names, renaming them all to just <code>a</code>, then renamed variable names to <code>x</code>, <code>y</code>, <code>z</code>, <code>w</code>, <code>v</code>, and <code>zs</code> (I'm creative, aren't I?), made all applications of the <code>:</code> operator infix, removed the <code>Rec</code> regions, and a few more bits of styling later I got the output you see above.

when should I use as-patterns to identify common sub-expressions?

Tags:

pattern-matching

haskell

ghc

I'm wondering to what extent do I have to worry about micro-optimizing common sub-expressions in Haskell (using GHC) especially with regard to data destructuring.

For example, consider this code for merging two ordered lists:

merge :: Ord a => [a] -> [a] -> [a]
merge [] bs = bs
merge as [] = as
merge (a:as) (b:bs) =
  case compare a b of
    LT -> a : merge as (b:bs)  -- (1)
    EQ -> a : merge as bs
    GT -> b : merge (a:as) bs  -- (2)

Will GHC recognize that the (a:as) at (2) and (b:bs) at (1) are the same as the input parameters? Or, should I use "as" patterns, e.g.:

merge a'@(a:as) b'@(b:bs) =
  ...
  LT -> a : merge as b'
  ...
  GT -> b : merge a' bs

And in general when can GHC recognize common sub-expressions in in data constructions and when do I need to help it?

325

asked Jul 22 '14 13:07

ErikR

Video Answer

1 Answers

It depends. If you compile without optimizations, there is definitely a penalty to not using the as-patterns, but if you compile the following code

module Test where


merge1 :: Ord a => [a] -> [a] -> [a]
merge1 [] bs = bs
merge1 as [] = as
merge1 (a:as) (b:bs) = case compare a b of
    LT -> a : merge1 as (b:bs)
    EQ -> a : merge1 as bs
    GT -> b : merge1 (a:as) bs


merge2 :: Ord a => [a] -> [a] -> [a]
merge2 [] bs = bs
merge2 as [] = as
merge2 xs@(a:as) ys@(b:bs) = case compare a b of
    LT -> a : merge2 as ys
    EQ -> a : merge2 as bs
    GT -> b : merge2 xs bs

with -O2 and -ddump-simpl, and after a lot of judicious editing, renaming of variables, stripping of metadata and annotations, and various other tricks you can arrive at something like

merge2_2 :: Ord a => a -> [a] -> [a] -> [a]
merge2_2 = \ ordInst x y z -> case z of _ {
      [] -> (x:y);
      (w:v) -> case compare ordInst x w of _ {
          LT -> x : merge2_1 ordInst y w v;
          EQ -> x : merge2   ordInst y   v;
          GT -> w : merge2_2 ordInst x y v
        }
    }

merge2_1 :: Ord a => [a] -> a -> [a] -> [a]
merge2_1 = \ ordInst x y z -> case x of _ {
      [] -> (y:z);
      (w:v) -> case compare ordInst w y of _ {
          LT -> w : merge2_1 ordInst v y z;
          EQ -> w : merge2   ordInst v   z;
          GT -> y : merge2_2 ordInst w v z
        }
    }

merge2 :: Ord a => [a] -> [a] -> [a]
merge2 = \ ordInst x y -> case x of wild {
      [] -> y;
      (w:v) -> case y of _ {
          [] -> wild;
          (z:zs) -> case compare ordInst w z of _ {
              LT -> w : merge2_1 ordInst v z zs;
              EQ -> w : merge2   ordInst v   zs;
              GT -> z : merge2_2 ordInst w v zs
            }
        }
    }


merge1_2 :: Ord a => a -> [a] -> [a] -> [a]
merge1_2 = \ ordInst x y z -> case z of _ {
      [] -> (x:y);
      (w:v) -> case compare ordInst x w of _ {
          LT -> x : merge1_1 ordInst y w v;
          EQ -> x : merge1   ordInst y   v;
          GT -> w : merge1_2 ordInst x y v
        }
    }

merge1_1 :: Ord a => [a] -> a -> [a] -> [a]
merge1_1 = \ ordInst x y z -> case x of _ {
      [] -> (y:z);
      (w:v) -> case compare ordInst w y of _ {
          LT -> w : merge1_1 ordInst v y z;
          EQ -> w : merge1   ordInst v   z;
          GT -> y : merge1_2 ordInst w v z
        }
    }

merge1 :: Ord a => [a] -> [a] -> [a]
merge1 = \ ordInst x y -> case x of wild {
      [] -> y;
      (w:v) -> case y of _ {
          [] -> wild;
          (z:zs) -> case compare ordInst w z of _ {
              LT -> w : merge1_1 ordInst v z zs;
              EQ -> w : merge1   ordInst v   zs;
              GT -> z : merge1_2 ordInst w v zs
            }
        }
    }

Which, after comparison with a diffing tool, tells me that the only differences between these two definitions is the numbers at the ends of the merge function names. So yes, after applying optimizations GHC can figure out these as-patterns for you, at least in the case of lists. However, I would still recommend using them because there are always edge cases, and if you do use them then you don't have to trust the compiler to do something complicated. It also means that even if optimizations aren't enabled, your code will still have that optimization in there. It may seem like a tiny change, but if this function ends up in an inner loop or the structures you're working on are very large it could have a significant performance impact. Additionally, I find that for constructors with a large number of fields it's often just more convenient to have a name to refer to "constructed" form.

Here's the full core dump if you're interested. Basically, I started by joining lines together to take up less room, then removed unnecessary noise in the type signatures, removed qualified module names, all of the [Something] annotations, renamed things like merge2_$smerge2 to merge2_2, removed all local type signatures, then started renaming using Sublime Text's wonderful multiple cursors feature. I started with the type names, renaming them all to just a, then renamed variable names to x, y, z, w, v, and zs (I'm creative, aren't I?), made all applications of the : operator infix, removed the Rec regions, and a few more bits of styling later I got the output you see above.

156

answered Dec 26 '22 13:12

bheklilr

Related questions
                            
                                Polymorphic constraint
                            
                                ghc-7.6 class instances for dependent types
                            
                                Is there a way of deriving Binary instances for Vinyl record types using Derive and Template Haskell or otherwise
                            
                                Standalone deriving declaration in Template Haskell quotation
                            
                                Does Haskell allow a let expression for multiple pattern matchings?
                            
                                Template Haskell error when using "deriving"
                            
                                Build balanced binary tree with foldr
                            
                                yesod-bin is missing an (unknown) dependency?
                            
                                Closed type families don’t work as expected
                            
                                Equality function for pair components
                            
                                How is callCC implemented in strict functional languages?
                            
                                Division in Haskell, still don't get it
                            
                                Haskell Persistent Joins with Esqueleto
                            
                                Differences between SCons and Shake
                            
                                Is there a standard name or implementation of the "purely applicative Either"?
                            
                                Profiling an executable with cabal
                            
                                Why do some of threepenny-gui FRP combinators operate on a MonadIO monad instead of being pure?
                            
                                Is this a GHC bug?
                            
                                Different behavior of cabal repl for library vs. executable
                            
                                When should Haskell functions take tuples, rather than multiple arguments?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With