In my <code>kdtree</code> project I just replaced a depth counter from being <code>Int</code>-based to an explicit <code>Key a</code> based on the type <code>a</code> in <code>KDTree v a</code>. This is the diff. Now while I believe this should be a type-level change only my benchmarks show a sharp drop in performance: Before: <pre class="prettyprint"><code>benchmarking nr/kdtree_nr mean: 60.19084 us, lb 59.87414 us, ub 60.57270 us, ci 0.950 std dev: 1.777527 us, lb 1.494657 us, ub 2.120168 us, ci 0.950 </code></pre> After: <pre class="prettyprint"><code>benchmarking nr/kdtree_nr mean: 556.9518 us, lb 554.0586 us, ub 560.6128 us, ci 0.950 std dev: 16.70620 us, lb 13.58185 us, ub 20.63450 us, ci 0.950 </code></pre> Before I dive into Core ... anyone has any idea what's going on here? <h3>Edit 1</h3> As proposed by Thomas (and userxyz) I replaced <code>data Key a :: *</code> with <code>type Key a :: *</code> and changed the implementation accordingly. This hasn't had any significent impact on the result: <pre class="prettyprint"><code>benchmarking nr/kdtree_nr mean: 538.2789 us, lb 537.5128 us, ub 539.4408 us, ci 0.950 std dev: 4.745118 us, lb 3.454081 us, ub 6.969091 us, ci 0.950 </code></pre> <h3>Edit 2</h3> Just had a quick look at the Core output. Apparently the change prevents functions depending on the class to be specialized, right? Before: <pre class="prettyprint"><code>lvl20 :: KDTree Vector (V3 Double) -> [V3 Double] lvl20 = \ (w4 :: KDTree Vector (V3 Double)) -> $wpointsAround $fKDCompareV3_$s$fKDCompareV3 lvl2 lvl4 nrRadius q w4 </code></pre> After: <pre class="prettyprint"><code>lvl18 :: KDTree Vector (V3 Double) -> [V3 Double] lvl18 = \ (w4 :: KDTree Vector (V3 Double)) -> $wpointsAround $dKDCompare lvl1 lvl3 nrRadius q w4 </code></pre> Small Update to Edit 2: Going crazy with INLINE pragmas doesn't change a thing here. <h3>Edit 3</h3> Quickly implemented what userxyz suggested: http://lpaste.net/104457 Been there before, can't make it to work: <pre class="prettyprint"><code>src/Data/KDTree.hs:48:49: Could not deduce (k ~ KeyV3) from the context (Real a, Floating a) bound by the instance declaration at src/Data/KDTree.hs:45:10-49 or from (Key k) bound by the type signature for dimDistance :: Key k => k -> V3 a -> V3 a -> Double at src/Data/KDTree.hs:47:3-13 ‘k’ is a rigid type variable bound by the type signature for dimDistance :: Key k => k -> V3 a -> V3 a -> Double at src/Data/KDTree.hs:47:3 Relevant bindings include k :: k (bound at src/Data/KDTree.hs:47:15) dimDistance :: k -> V3 a -> V3 a -> Double (bound at src/Data/KDTree.hs:47:3) In the pattern: V3X In a case alternative: V3X -> ax - bx In the second argument of ‘($)’, namely ‘case k of { V3X -> ax - bx V3Y -> ay - by V3Z -> az - bz }’ </code></pre> <h3>Edit 4</h3> Hmm ... I think I just "solved" the problem by just throwing SPECIALIZE pragmas at the functions. This in effect causes everything to be inlined and removes the explicit dictionary passing. I am not too happy with that solution as this means I have to put a big "please specialize your calls to achieve decent performance" warning in the docs.

By sheer chance I just stumbled upon this question: Transitivity of Auto-Specialization in GHC There the OP quotes "From the docs for GHC 7.6:" (emphasis mine): <blockquote> [Y]ou often don't even need the SPECIALIZE pragma in the first place. When compiling a module M, GHC's optimiser (with -O) automatically considers each top-level overloaded function declared in M, and specialises it for the different types at which it is called in M. The optimiser also considers each imported INLINABLE overloaded function, and specialises it for the different types at which it is called in M. </blockquote> As a result I have just removed all (hard) INLINE and SPECIALIZE pragmas and replaced them with INLINEABLE pragmas where appropriate (ie on each function that is used in the benchmark suite). As a result I get even better times than with inline pragmas on all functions. Quintessence: let the compiler do it's work but give him a hint sometimes.

Why does introducing associated types kill my performance?

Tags:

haskell

type-families

associated-types

In my kdtree project I just replaced a depth counter from being Int-based to an explicit Key a based on the type a in KDTree v a. This is the diff.

Now while I believe this should be a type-level change only my benchmarks show a sharp drop in performance:

Before:

benchmarking nr/kdtree_nr 
mean: 60.19084 us, lb 59.87414 us, ub 60.57270 us, ci 0.950
std dev: 1.777527 us, lb 1.494657 us, ub 2.120168 us, ci 0.950

After:

benchmarking nr/kdtree_nr 
mean: 556.9518 us, lb 554.0586 us, ub 560.6128 us, ci 0.950 
std dev: 16.70620 us, lb 13.58185 us, ub 20.63450 us, ci 0.950

Before I dive into Core ... anyone has any idea what's going on here?

Edit 1

As proposed by Thomas (and userxyz) I replaced data Key a :: * with type Key a :: * and changed the implementation accordingly. This hasn't had any significent impact on the result:

benchmarking nr/kdtree_nr
mean: 538.2789 us, lb 537.5128 us, ub 539.4408 us, ci 0.950
std dev: 4.745118 us, lb 3.454081 us, ub 6.969091 us, ci 0.950

Edit 2

Just had a quick look at the Core output. Apparently the change prevents functions depending on the class to be specialized, right?

Before:

lvl20 :: KDTree Vector (V3 Double) -> [V3 Double]
lvl20 =
  \ (w4 :: KDTree Vector (V3 Double)) ->
    $wpointsAround $fKDCompareV3_$s$fKDCompareV3 lvl2 lvl4 nrRadius q w4

After:

lvl18 :: KDTree Vector (V3 Double) -> [V3 Double]
lvl18 =
  \ (w4 :: KDTree Vector (V3 Double)) ->
    $wpointsAround $dKDCompare lvl1 lvl3 nrRadius q w4

Small Update to Edit 2: Going crazy with INLINE pragmas doesn't change a thing here.

Edit 3

Quickly implemented what userxyz suggested: http://lpaste.net/104457 Been there before, can't make it to work:

src/Data/KDTree.hs:48:49:
    Could not deduce (k ~ KeyV3)
    from the context (Real a, Floating a)
      bound by the instance declaration at src/Data/KDTree.hs:45:10-49
    or from (Key k)
      bound by the type signature for
                 dimDistance :: Key k => k -> V3 a -> V3 a -> Double
      at src/Data/KDTree.hs:47:3-13
      ‘k’ is a rigid type variable bound by
          the type signature for
            dimDistance :: Key k => k -> V3 a -> V3 a -> Double
          at src/Data/KDTree.hs:47:3
    Relevant bindings include
      k :: k (bound at src/Data/KDTree.hs:47:15)
      dimDistance :: k -> V3 a -> V3 a -> Double
        (bound at src/Data/KDTree.hs:47:3)
    In the pattern: V3X
    In a case alternative: V3X -> ax - bx
    In the second argument of ‘($)’, namely
      ‘case k of {
         V3X -> ax - bx
         V3Y -> ay - by
         V3Z -> az - bz }’

Edit 4

Hmm ... I think I just "solved" the problem by just throwing SPECIALIZE pragmas at the functions. This in effect causes everything to be inlined and removes the explicit dictionary passing.

I am not too happy with that solution as this means I have to put a big "please specialize your calls to achieve decent performance" warning in the docs.

278

asked May 21 '14 20:05

fho

1 Answers

By sheer chance I just stumbled upon this question: Transitivity of Auto-Specialization in GHC

There the OP quotes "From the docs for GHC 7.6:" (emphasis mine):

[Y]ou often don't even need the SPECIALIZE pragma in the first place. When compiling a module M, GHC's optimiser (with -O) automatically considers each top-level overloaded function declared in M, and specialises it for the different types at which it is called in M. The optimiser also considers each imported INLINABLE overloaded function, and specialises it for the different types at which it is called in M.

As a result I have just removed all (hard) INLINE and SPECIALIZE pragmas and replaced them with INLINEABLE pragmas where appropriate (ie on each function that is used in the benchmark suite). As a result I get even better times than with inline pragmas on all functions.

Quintessence: let the compiler do it's work but give him a hint sometimes.

108

answered Jan 01 '23 22:01

fho

Related questions
                            
                                Unwrapping datatypes in Haskell without extraneous code
                            
                                Screen capture in Haskell?
                            
                                Command ls or dir in ghci
                            
                                Remove elements by index in haskell
                            
                                What is the best haskell documentation available online? [closed]
                            
                                Simple word count in haskell
                            
                                Standard Haskell function :: (a -> Maybe b) -> [a] -> Maybe b
                            
                                Haskell, list of natural number
                            
                                Equivalent of python eval in Haskell
                            
                                Is there any working implementation of reverse mode automatic differentiation for Haskell?
                            
                                What does () mean in Haskell
                            
                                Algorithm - How to delete duplicate elements in a list efficiently?
                            
                                How can I re-assign a variable in a function in Haskell?
                            
                                Folding over a polymorphic list in Haskell
                            
                                How to reduce binary footprint when using Haskell package 'plugins'?
                            
                                How can I find the allocation in a Haskell function compiled with GHC?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With