Here's the code: <pre class="prettyprint"><code>{-# LANGUAGE FlexibleContexts #-} import Data.Int import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Generic as V {-# NOINLINE f #-} -- Note the 'NO' --f :: (Num r, V.Vector v r) => v r -> v r -> v r --f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64 --f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64 f = V.zipWith (+) -- or U.zipWith, it doesn't make a difference main = do let iters = 100 dim = 221184 y = U.replicate dim 0 :: U.Vector Int64 let ans = iterate ((f y)) y !! iters putStr $ (show $ U.sum ans) </code></pre> I compiled with <code>ghc 7.6.2</code> and <code>-O2</code>, and it took 1.7 seconds to run. I tried several different versions of <code>f</code>: <ol> <li><code>f x = U.zipWith (+) x</code></li> <li><code>f x = (U.zipWith (+) x) . id</code></li> <li><code>f x y = U.zipWith (+) x y</code></li> </ol> Version 1 is the same as the original while versions 2 and 3 run in in under 0.09 seconds (and <code>INLINING</code> <code>f</code> doesn't change anything). I also noticed that if I make <code>f</code> polymorphic (with any of the three signatures above), even with a "fast" definition (i.e. 2 or 3), it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though I'm explicitly giving the types for the Vector type and element type. I'm also interested in adding integers modulo <code>q</code>: <pre class="prettyprint"><code>newtype Zq q i = Zq {unZq :: i} </code></pre> As when adding <code>Int64</code>s, if I write a function with every type specified, <pre class="prettyprint"><code>h :: U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64) </code></pre> I get an order of magnitude better performance than if I leave any polymorphism <pre class="prettyprint"><code>h :: (Modulus q) => U.Vector (Zq q Int64) -> U.Vector (Zq q Int64) -> U.Vector (Zq q Int64) </code></pre> But I should at least be able to remove the specific phantom type! It should be compiled out, since I'm dealing with a <code>newtype</code>. Here are my questions: <ol> <li>Where is the slowdown coming from?</li> <li>What is going on in versions 2 and 3 of <code>f</code> that affect performance in any way? It seems like a bug to me that (what amounts to) coding style can affect performance like this. Are there other examples outside of Vector where partially applying a function or other stylistic choices affect performance?</li> <li>Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the <code>Num</code> type, both, or phantom type)? I know polymorphism makes code slower, but this is ridiculous. Is there a hack around it?</li> </ol> <blockquote> EDIT 1 I filed a issue with the Vector library page. I found a <a href="http://ghc.haskell.org/trac/ghc/ticket/8099" rel="nofollow">GHC issue</a> relating to this problem. EDIT2 I rewrote the question after gaining some insight from @kqr's answer. Below is the original for reference. </blockquote> --------------ORIGINAL QUESTION-------------------- Here's the code: <pre class="prettyprint"><code>{-# LANGUAGE FlexibleContexts #-} import Control.DeepSeq import Data.Int import qualified Data.Vector.Unboxed as U import qualified Data.Vector.Generic as V {-# NOINLINE f #-} -- Note the 'NO' --f :: (Num r, V.Vector v r) => v r -> v r -> v r --f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64 --f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64 f = V.zipWith (+) main = do let iters = 100 dim = 221184 y = U.replicate dim 0 :: U.Vector Int64 let ans = iterate ((f y)) y !! iters putStr $ (show $ U.sum ans) </code></pre> I compiled with <code>ghc 7.6.2</code> and <code>-O2</code>, and it took 1.7 seconds to run. I tried several different versions of <code>f</code>: <ol> <li><code>f x = U.zipWith (+) x</code></li> <li><code>f x = (U.zipWith (+) x) . U.force</code></li> <li><code>f x = (U.zipWith (+) x) . Control.DeepSeq.force)</code></li> <li><code>f x = (U.zipWith (+) x) . (\z -> z `seq` z)</code></li> <li><code>f x = (U.zipWith (+) x) . id</code></li> <li><code>f x y = U.zipWith (+) x y</code></li> </ol> Version 1 is the same as the original, version 2 runs in 0.111 seconds, and versions 3-6 run in in under 0.09 seconds (and <code>INLINING</code> <code>f</code> doesn't change anything). So the order-of-magnitude slowdown appears to be due to laziness since <code>force</code> helped, but I'm not sure where the laziness is coming from. Unboxed types aren't allowed to be lazy, right? I tried writing a strict version of <code>iterate</code>, thinking the vector itself must be lazy: <pre class="prettyprint"><code>{-# INLINE iterate' #-} iterate' :: (NFData a) => (a -> a) -> a -> [a] iterate' f x = x `seq` x : iterate' f (f x) </code></pre> but with the point-free version of <code>f</code>, this didn't help at all. I also noticed something else, which could be just a coincidence and red herring: If I make <code>f</code> polymorphic (with any of the three signatures above), even with a "fast" definition, it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though everything should be inferred nicely. Here are my questions: <ol> <li>Where is the slowdown coming from?</li> <li>Why does composing with <code>force</code> help, but using a strict <code>iterate</code> doesn't? </li> <li>Why is <code>U.force</code> worse than <code>DeepSeq.force</code>? I have no idea what <code>U.force</code> is supposed to do, but it sounds a lot like <code>DeepSeq.force</code>, and seems to have a similar effect.</li> <li>Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the <code>Num</code> type, or both)?</li> <li>Why are versions 5 and 6, neither of which should have any strictness implications at all, just as fast as a strict function?</li> </ol> As @kqr pointed out, the problem doesn't seem to be strictness. So something about the way I write the function is causing the generic <code>zipWith</code> to be used rather than the Unboxed-specific version. Is this just a fluke between GHC and the Vector library, or is there something more general that can be said here?

While I don't have the definitive answer you want, there are two things that might help you along. The first thing is that <code>x `seq` x</code> is, both semantically and computationally, the same thing as just <code>x</code>. The wiki says about <code>seq</code>: <blockquote> A common misconception regarding <code>seq</code> is that <code>seq x</code> "evaluates" <code>x</code>. Well, sort of. <code>seq</code> doesn't evaluate anything just by virtue of existing in the source file, all it does is introduce an artificial data dependency of one value on another: when the result of <code>seq</code> is evaluated, the first argument must also (sort of; see below) be evaluated. As an example, suppose <code>x :: Integer</code>, then <code>seq x b</code> behaves essentially like <code>if x == 0 then b else b</code> – unconditionally equal to <code>b</code>, but forcing <code>x</code> along the way. In particular, the expression <code>x `seq` x</code> is completely redundant, and always has exactly the same effect as just writing <code>x</code>. </blockquote> What the first paragraph says is that writing <code>seq a b</code> doesn't mean that <code>a</code> will magically get evaluated this instant, it means that <code>a</code> will get evaluated as soon as <code>b</code> needs to be evaluated. This might occur later in the program, or maybe never at all. When you view it in that light, it is obvious that <code>seq x x</code> is a redundancy, because all it says is, "evaluate <code>x</code> as soon as <code>x</code> needs to be evaluated." Which of course is what would happen anyway if you had just written <code>x</code>. This has two implications for you: <ol> <li> <strike>Your "strict" <code>iterate'</code> function isn't really any stricter than it would be without the <code>seq</code>. In fact, I have a hard time imagining how the <code>iterate</code> function could become any stricter than it already is. You can't make the tail of the list strict, because it is infinite. The main thing you can do is force the "accumulator", <code>f x</code>, but doing so doesn't give any significant performance increase on my system.[1]</strike> Scratch that. Your strict <code>iterate'</code> does exactly the same thing as my bang pattern version. See the comments. </li> <li>Writing <code>(\z -> z `seq` z)</code> does not give you a strict identity function, which I assume is what you were going for. In fact, the common identity function is as strict as you'll get – it will evaluate its result as soon as it is needed.</li> </ol> However, I peeked at the core GHC generates for <pre class="prettyprint"><code>U.zipWith (+) y </code></pre> and <pre class="prettyprint"><code>U.zipWith (+) y . id </code></pre> and there is only one big difference that my untrained eye can spot. The first one uses just a plain <code>Data.Vector.Generic.zipWith</code> (here's where your polymorphism coincidence might come into play – if GHC chooses a generic <code>zipWith</code> it will of course perform as if the code was polymorphic!) while the latter has exploded this single function call into almost 90 lines of state monad code and unpacked machine types. The state monad code looks almost like the loops and destructive updates you would write in an imperative language, so I assume it's tailored pretty well to the machine it's running on. If I wasn't in such a hurry, I would take a longer look to see more exactly how it works and why GHC suddenly decided it needed a tight loop. I have attached the generated core as much for myself as anyone else who wants to take a look.[2] <hr> [1]: Forcing the accumulator along the way: (This is what you already do, I misunderstood the code!) <pre class="prettyprint"><code>{-# LANGUAGE BangPatterns #-} iterate' f !x = x : iterate f (f x) </code></pre> [2]: What core <code>U.zipWith (+) y . id</code> gets translated into.

Style vs Performance Using Vectors

Tags:

lambda

haskell

pointfree

Here's the code:

{-# LANGUAGE FlexibleContexts #-}

import Data.Int
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Generic as V

{-# NOINLINE f #-} -- Note the 'NO'
--f :: (Num r, V.Vector v r) => v r -> v r -> v r
--f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64
--f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r
f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64
f = V.zipWith (+) -- or U.zipWith, it doesn't make a difference

main = do
    let iters = 100
        dim = 221184
        y = U.replicate dim 0 :: U.Vector Int64
    let ans = iterate ((f y)) y !! iters
    putStr $ (show $ U.sum ans)

I compiled with ghc 7.6.2 and -O2, and it took 1.7 seconds to run.

I tried several different versions of f:

f x = U.zipWith (+) x
f x = (U.zipWith (+) x) . id
f x y = U.zipWith (+) x y

Version 1 is the same as the original while versions 2 and 3 run in in under 0.09 seconds (and INLINING f doesn't change anything).

I also noticed that if I make f polymorphic (with any of the three signatures above), even with a "fast" definition (i.e. 2 or 3), it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though I'm explicitly giving the types for the Vector type and element type.

I'm also interested in adding integers modulo q:

newtype Zq q i = Zq {unZq :: i}

As when adding Int64s, if I write a function with every type specified,

h :: U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64)

I get an order of magnitude better performance than if I leave any polymorphism

h :: (Modulus q) => U.Vector (Zq q Int64) -> U.Vector (Zq q Int64) -> U.Vector (Zq q Int64)

But I should at least be able to remove the specific phantom type! It should be compiled out, since I'm dealing with a newtype.

Here are my questions:

Where is the slowdown coming from?
What is going on in versions 2 and 3 of f that affect performance in any way? It seems like a bug to me that (what amounts to) coding style can affect performance like this. Are there other examples outside of Vector where partially applying a function or other stylistic choices affect performance?
Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the Num type, both, or phantom type)? I know polymorphism makes code slower, but this is ridiculous. Is there a hack around it?

EDIT 1

I filed a issue with the Vector library page. I found a GHC issue relating to this problem.

EDIT2

I rewrote the question after gaining some insight from @kqr's answer. Below is the original for reference.

--------------ORIGINAL QUESTION--------------------

Here's the code:

{-# LANGUAGE FlexibleContexts #-}

import Control.DeepSeq
import Data.Int
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Generic as V

{-# NOINLINE f #-} -- Note the 'NO'
--f :: (Num r, V.Vector v r) => v r -> v r -> v r
--f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64
--f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r
f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64
f = V.zipWith (+)

main = do
    let iters = 100
        dim = 221184
        y = U.replicate dim 0 :: U.Vector Int64
    let ans = iterate ((f y)) y !! iters
    putStr $ (show $ U.sum ans)

I compiled with ghc 7.6.2 and -O2, and it took 1.7 seconds to run.

I tried several different versions of f:

f x = U.zipWith (+) x
f x = (U.zipWith (+) x) . U.force
f x = (U.zipWith (+) x) . Control.DeepSeq.force)
f x = (U.zipWith (+) x) . (\z -> z `seq` z)
f x = (U.zipWith (+) x) . id
f x y = U.zipWith (+) x y

Version 1 is the same as the original, version 2 runs in 0.111 seconds, and versions 3-6 run in in under 0.09 seconds (and INLINING f doesn't change anything).

So the order-of-magnitude slowdown appears to be due to laziness since force helped, but I'm not sure where the laziness is coming from. Unboxed types aren't allowed to be lazy, right?

I tried writing a strict version of iterate, thinking the vector itself must be lazy:

{-# INLINE iterate' #-}
iterate' :: (NFData a) => (a -> a) -> a -> [a]
iterate' f x =  x `seq` x : iterate' f (f x)

but with the point-free version of f, this didn't help at all.

I also noticed something else, which could be just a coincidence and red herring: If I make f polymorphic (with any of the three signatures above), even with a "fast" definition, it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though everything should be inferred nicely.

Here are my questions:

Where is the slowdown coming from?
Why does composing with force help, but using a strict iterate doesn't?
Why is U.force worse than DeepSeq.force? I have no idea what U.force is supposed to do, but it sounds a lot like DeepSeq.force, and seems to have a similar effect.
Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the Num type, or both)?
Why are versions 5 and 6, neither of which should have any strictness implications at all, just as fast as a strict function?

As @kqr pointed out, the problem doesn't seem to be strictness. So something about the way I write the function is causing the generic zipWith to be used rather than the Unboxed-specific version. Is this just a fluke between GHC and the Vector library, or is there something more general that can be said here?

257

asked Nov 06 '13 04:11

crockeea

1 Answers

While I don't have the definitive answer you want, there are two things that might help you along.

The first thing is that x `seq` x is, both semantically and computationally, the same thing as just x. The wiki says about seq:

A common misconception regarding seq is that seq x "evaluates" x. Well, sort of. seq doesn't evaluate anything just by virtue of existing in the source file, all it does is introduce an artificial data dependency of one value on another: when the result of seq is evaluated, the first argument must also (sort of; see below) be evaluated.

As an example, suppose x :: Integer, then seq x b behaves essentially like if x == 0 then b else b – unconditionally equal to b, but forcing x along the way. In particular, the expression x `seq` x is completely redundant, and always has exactly the same effect as just writing x.

What the first paragraph says is that writing seq a b doesn't mean that a will magically get evaluated this instant, it means that a will get evaluated as soon as b needs to be evaluated. This might occur later in the program, or maybe never at all. When you view it in that light, it is obvious that seq x x is a redundancy, because all it says is, "evaluate x as soon as x needs to be evaluated." Which of course is what would happen anyway if you had just written x.

This has two implications for you:

Your "strict" iterate' function isn't really any stricter than it would be without the seq. In fact, I have a hard time imagining how the iterate function could become any stricter than it already is. You can't make the tail of the list strict, because it is infinite. The main thing you can do is force the "accumulator", f x, but doing so doesn't give any significant performance increase on my system.[1]

Scratch that. Your strict iterate' does exactly the same thing as my bang pattern version. See the comments.
Writing (\z -> z `seq` z) does not give you a strict identity function, which I assume is what you were going for. In fact, the common identity function is as strict as you'll get – it will evaluate its result as soon as it is needed.

However, I peeked at the core GHC generates for

U.zipWith (+) y

and

U.zipWith (+) y . id

and there is only one big difference that my untrained eye can spot. The first one uses just a plain Data.Vector.Generic.zipWith (here's where your polymorphism coincidence might come into play – if GHC chooses a generic zipWith it will of course perform as if the code was polymorphic!) while the latter has exploded this single function call into almost 90 lines of state monad code and unpacked machine types.

The state monad code looks almost like the loops and destructive updates you would write in an imperative language, so I assume it's tailored pretty well to the machine it's running on. If I wasn't in such a hurry, I would take a longer look to see more exactly how it works and why GHC suddenly decided it needed a tight loop. I have attached the generated core as much for myself as anyone else who wants to take a look.[2]

[1]: Forcing the accumulator along the way: (This is what you already do, I misunderstood the code!)

{-# LANGUAGE BangPatterns #-}
iterate' f !x = x : iterate f (f x)

[2]: What core U.zipWith (+) y . id gets translated into.

104

answered Oct 12 '22 22:10

kqr

Related questions
                            
                                Is there a Codensity MonadPlus that asymptotically optimizes a sequence of MonadPlus operations?
                            
                                Why is this Haskell code so much slower than the C equivalent? Unboxed vectors and bangs already used
                            
                                How to upgrade GHC with Stack
                            
                                Why is the Haddock documentation not showing up on Hackage?
                            
                                How do you override Haskell type class instances provided by package code?
                            
                                How to write a simple HTTP server in Haskell using Network.HTTP.receiveHTTP
                            
                                Why isn't the Prelude's words function written more simply?
                            
                                Haskell: Lazy vs. Strict Text values, which one is recommended when?
                            
                                GHC rewrite rules with class constraints
                            
                                How can I deal with comments in my AST?
                            
                                Word foldl' isn't optimized as well as Int foldl'
                            
                                Proc syntax in Haskell Arrows leads to severe performance penalty
                            
                                What type corresponds to a xor b in type theory?
                            
                                Implementations of spatial indexes in Haskell?
                            
                                How to create instance of Read for a datatype in haskell
                            
                                Haskell GUI programming tools
                            
                                GADTs vs. MultiParamTypeClasses
                            
                                Haskell pattern matching on GADTs with Data Kinds
                            
                                Is it possible to annotate a function's special properties (e.g. surjectivity)?
                            
                                Find inferred type for local function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With