Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summing a large list of numbers is too slow

Tags:

haskell

ghc

Task: "Sum the first 15,000,000 even numbers."

Haskell:

nats = [1..] :: [Int]
evens = filter even nats :: [Int]

MySum:: Int
MySum= sum $ take 15000000 evens

...but MySum takes ages. More precisely, about 10-20 times slower than C/C++.

Many times I've found, that a Haskell solution coded naturally is something like 10 times slower than C. I expected that GHC was a very neatly optimizing compiler and task such this don't seem that tough.

So, one would expect something like 1.5-2x slower than C. Where is the problem?

Can this be solved better?

This is the C code I'm comparing it with:

long long sum = 0;
int n = 0, i = 1;

for (;;) {

  if (i % 2 == 0) {
    sum += i;
    n++;
  }

  if (n == 15000000)
    break;

  i++;
}

Edit 1: I really know, that it can be computed in O(1). Please, resist.

Edit 2: I really know, that evens are [2,4..] but the function even could be something else O(1) and need to be implemented as a function.

like image 442
Cartesius00 Avatar asked Nov 15 '12 14:11

Cartesius00


People also ask

Is python sum slow?

sum is implemented in C inside the Python interpreter, while your for loop has to be interpreted, it's normal that it's slower. In CPython built-in functions are much faster than the pure-python translation.

Is sum faster than for loop?

A key difference between R and many other languages is a topic known as vectorization. When you wrote the total function, we mentioned that R already has sum to do this; sum is much faster than the interpreted for loop because sum is coded in C to work with a vector of numbers.

Are python Builtins faster?

Use the Built-In FunctionsMany of Python's built-in functions are written in C, which makes them much faster than a pure python solution. Take a very simple task of summing a lot of numbers. We could loop through each number, summing as we go.

How to add a list of numbers together python?

You can now use Python's built-in function sum() to add multiple numeric values together. This function provides an efficient, readable, and Pythonic way to solve summation problems in your code.


4 Answers

Lists are not loops

So don't be surprised if using lists as a loop replacement, you get slower code if the loop body is small.

nats = [1..] :: [Int]
evens = filter even nats :: [Int]

dumbSum :: Int
dumbSum = sum $ take 15000000 evens

sum is not a "good consumer", so GHC is not (yet) able to eliminate the intermediate lists completely.

If you compile with optimisations (and don't export nat), GHC is smart enough to fuse the filter with the enumeration,

Rec {
Main.main_go [Occ=LoopBreaker]
  :: GHC.Prim.Int# -> GHC.Prim.Int# -> [GHC.Types.Int]
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType L]
Main.main_go =
  \ (x_aV2 :: GHC.Prim.Int#) ->
    let {
      r_au7 :: GHC.Prim.Int# -> [GHC.Types.Int]
      [LclId, Str=DmdType]
      r_au7 =
        case x_aV2 of wild_Xl {
          __DEFAULT -> Main.main_go (GHC.Prim.+# wild_Xl 1);
          9223372036854775807 -> n_r1RR
        } } in
    case GHC.Prim.remInt# x_aV2 2 of _ {
      __DEFAULT -> r_au7;
      0 ->
        let {
          wild_atm :: GHC.Types.Int
          [LclId, Str=DmdType m]
          wild_atm = GHC.Types.I# x_aV2 } in
        let {
          lvl_s1Rp :: [GHC.Types.Int]
          [LclId]
          lvl_s1Rp =
            GHC.Types.:
              @ GHC.Types.Int wild_atm (GHC.Types.[] @ GHC.Types.Int) } in
        \ (m_aUL :: GHC.Prim.Int#) ->
          case GHC.Prim.<=# m_aUL 1 of _ {
            GHC.Types.False ->
              GHC.Types.: @ GHC.Types.Int wild_atm (r_au7 (GHC.Prim.-# m_aUL 1));
            GHC.Types.True -> lvl_s1Rp
          }
    }
end Rec }

but that's as far as GHC's fusion takes it. You are left with boxing Ints and constructing list cells. If you give it a loop, like you give it to the C compiler,

module Main where

import Data.Bits

main :: IO ()
main = print dumbSum

dumbSum :: Int
dumbSum = go 0 0 1
  where
    go :: Int -> Int -> Int -> Int
    go sm ct n
        | ct >= 15000000 = sm
        | n .&. 1 == 0   = go (sm + n) (ct+1) (n+1)
        | otherwise      = go sm ct (n+1)

you get the approximate relation of running times between the C and the Haskell version you expected.

This sort of algorithm is not what GHC has been taught to optimise well, there are bigger fish to fry elsewhere before the limited manpower is put into these optimisations.

like image 184
Daniel Fischer Avatar answered Sep 21 '22 12:09

Daniel Fischer


The problem why list fusion can't work here is actually rather subtle. Say we define the right RULE to fuse the list away:

import GHC.Base
sum2 :: Num a => [a] -> a
sum2 = sum
{-# NOINLINE [1] sum2 #-}
{-# RULES "sum" forall (f :: forall b. (a->b->b)->b->b).
                sum2 (build f) = f (+) 0 #-}

(The short explanation is that we define sum2 as an alias of sum, which we forbid GHC to inline early, so the RULE has a chance to fire before sum2 gets eliminated. Then we look for sum2 directly next to the list-builder build (see definition) and replace it by direct arithmetic.)

This has mixed success, as it yields the following Core:

Main.$wgo =
  \ (w_s1T4 :: GHC.Prim.Int#) ->
    case GHC.Prim.remInt# w_s1T4 2 of _ {
      __DEFAULT ->
        case w_s1T4 of wild_Xg {
          __DEFAULT -> Main.$wgo (GHC.Prim.+# wild_Xg 1);
          15000000 -> 0
        };
      0 ->
        case w_s1T4 of wild_Xg {
          __DEFAULT ->
            case Main.$wgo (GHC.Prim.+# wild_Xg 1) of ww_s1T7 { __DEFAULT ->
            GHC.Prim.+# wild_Xg ww_s1T7
            };
          15000000 -> 15000000
        }
    }

Which is nice, completely fused code - with the sole problem that we have a call to $wgo in a non-tail-call position. This means that we aren't looking at a loop, but actually at a deeply recursive function, with predictable program results:

Stack space overflow: current size 8388608 bytes.

The root problem here is that the Prelude's list fusion can only fuse right folds, and computing the sum as a right fold directly causes the excessive stack consumption. The obvious fix would be to use a fusion framework that can actually deal with left folds, such as Duncan's stream-fusion package, which actually implements sum fusion.

Another solution would be to hack around it - and implement the left fold using a right fold:

main = print $ foldr (\x c -> c . (+x)) id [2,4..15000000] 0

This actually produces close-to-perfect code with current versions of GHC. On the other hand, this is generally not a good idea as it relies on GHC being smart enough to eliminate the partially applied functions. Already adding a filter into the chain will break that particular optimization.

like image 40
Peter Wortmann Avatar answered Sep 17 '22 12:09

Peter Wortmann


Sum first 15,000,000 even numbers:

{-# LANGUAGE BangPatterns #-}

g :: Integer    -- 15000000*15000001 = 225000015000000
g = go 1 0 0
  where
    go i !a c  | c == 15000000 = a       
    go i !a c  | even i = go (i+1) (a+i) (c+1)
    go i !a c           = go (i+1) a c

ought to be the fastest.

like image 24
Will Ness Avatar answered Sep 18 '22 12:09

Will Ness


If you want to be sure to traverse the list only once, you can write the traversal explicitly:

nats = [1..] :: [Int]

requiredOfX :: Int -> Bool -- this way you can write a different requirement
requiredOfX x = even x

dumbSum :: Int
dumbSum = dumbSum' 0 0 nats
  where dumbSum' acc 15000000 _ = acc
        dumbSum' acc count (x:xs)
          | requiredOfX x = dumbSum' (acc + x) (count + 1) xs
          | otherwise     = dumbSum' acc (count + 1) xs
like image 24
amindfv Avatar answered Sep 21 '22 12:09

amindfv