Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pointfree version worsens the performance

Well, it turns out that I got this function defined in my program code:

st_zipOp :: (a -> a -> a) -> Stream a -> Stream a -> Stream a
st_zipOp f xs ys = St.foldr (\x r -> st_map (f x) r) xs ys

It does what it seems to do. It zips (applying the operator several times, yes) two elements of type Stream a, which is a list-like type, using an inner operator of the type a. The definition is pretty straightforward.

Once I had defined the function this way, I tried this other version:

st_zipOp :: (a -> a -> a) -> Stream a -> Stream a -> Stream a
st_zipOp = St.foldr . (st_map .)

As far as I know, this is exactly the same definition as above. It is just a point-free version of the previous definition.

However, I wanted to check if there was any performance change, and I found that, indeed, the point-free version made the program run slightly worse (both in memory and time).

Why is this happening? If it is necessary, I can write a test program that reproduces this behavior.

I am compiling with -O2 if that makes a difference.

Simple test case

I wrote the following code, trying to reproduce the behavior explained above. I used lists this time, and the change in the performance was less noticeable, but still existent. This is the code:

opEvery :: (a -> a -> a) -> [a] -> [a] -> [a]
opEvery f xs ys = foldr (\x r -> map (f x) r) xs ys

opEvery' :: (a -> a -> a) -> [a] -> [a] -> [a]
opEvery' = foldr . (map .)

main :: IO ()
main = print $ sum $ opEvery (+) [1..n] [1..n]
 where
  n :: Integer
  n = 5000

The profiling results using opEvery (explicit arguments version):

total time  =        2.91 secs   (2906 ticks @ 1000 us, 1 processor)
total alloc = 1,300,813,124 bytes  (excludes profiling overheads)

The profiling results using opEvery' (point free version):

total time  =        3.24 secs   (3242 ticks @ 1000 us, 1 processor)
total alloc = 1,300,933,160 bytes  (excludes profiling overheads)

However, I expected both versions to be equivalent (in all senses).

like image 526
Daniel Díaz Avatar asked Jan 30 '13 17:01

Daniel Díaz


1 Answers

For the simple test case, both versions yield the same core when compiled with optimisations, but without profiling.

When compiling with profiling enabled (-prof -fprof-auto), the pointfull version gets inlined, resulting in the main part being

Rec {
Main.main_go [Occ=LoopBreaker]
  :: [GHC.Integer.Type.Integer] -> [GHC.Integer.Type.Integer]
[GblId, Arity=1, Str=DmdType S]
Main.main_go =
  \ (ds_asR :: [GHC.Integer.Type.Integer]) ->
    case ds_asR of _ {
      [] -> xs_r1L8;
      : y_asW ys_asX ->
        let {
          r_aeN [Dmd=Just S] :: [GHC.Integer.Type.Integer]
          [LclId, Str=DmdType]
          r_aeN = Main.main_go ys_asX } in
        scctick<opEvery.\>
        GHC.Base.map
          @ GHC.Integer.Type.Integer
          @ GHC.Integer.Type.Integer
          (GHC.Integer.Type.plusInteger y_asW)
          r_aeN
    }
end Rec }

(you get something better without profiling).

When compiling the pointfree version with profiling enabled, opEvery' is not inlined, and you get

Main.opEvery'
  :: forall a_aeW.
     (a_aeW -> a_aeW -> a_aeW) -> [a_aeW] -> [a_aeW] -> [a_aeW]
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, WorkFree=False, Expandable=False,
         Guidance=IF_ARGS [] 80 60}]
Main.opEvery' =
  \ (@ a_c) ->
    tick<opEvery'>
    \ (x_ass :: a_c -> a_c -> a_c) ->
      scc<opEvery'>
      GHC.Base.foldr
        @ a_c
        @ [a_c]
        (\ (x1_XsN :: a_c) -> GHC.Base.map @ a_c @ a_c (x_ass x1_XsN))

Main.main4 :: [GHC.Integer.Type.Integer]
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, WorkFree=False, Expandable=False,
         Guidance=IF_ARGS [] 40 0}]
Main.main4 =
  scc<main>
  Main.opEvery'
    @ GHC.Integer.Type.Integer
    GHC.Integer.Type.plusInteger
    Main.main7
    Main.main5

When you add an {-# INLINABLE opEvery' #-} pragma, it can be inlined even when compiling for profiling, giving

Rec {
Main.main_go [Occ=LoopBreaker]
  :: [GHC.Integer.Type.Integer] -> [GHC.Integer.Type.Integer]
[GblId, Arity=1, Str=DmdType S]
Main.main_go =
  \ (ds_asz :: [GHC.Integer.Type.Integer]) ->
    case ds_asz of _ {
      [] -> lvl_r1KU;
      : y_asE ys_asF ->
        GHC.Base.map
          @ GHC.Integer.Type.Integer
          @ GHC.Integer.Type.Integer
          (GHC.Integer.Type.plusInteger y_asE)
          (Main.main_go ys_asF)
    }
end Rec }

which is even a bit faster than the pragma-less pointfull version, since it doesn't need to tick the counters.

It is likely that a similar effect occurred for the Stream case.

The takeaway:

  • Profiling inhibits optimisations. Code that is equivalent without profiling may not be with profiling support.
  • Never measure performance using code that was compiled for profiling or without optimisations.
  • Profiling can help you find out where the time is spent in your code [but, occasionally, enabling profiling can entirely alter the behaviour of the code; anything relying heavily on rewrite-rule optimisations and/or inlining is a candidate for that to happen], but it cannot tell you how fast your code is.
like image 195
Daniel Fischer Avatar answered Sep 22 '22 23:09

Daniel Fischer