Why is it faster to sum a Data.Sequence by divide-and-conquer, even with no parallelism?

Question

Note: This answer doesn't actually answer the question. It only restates the question in a different way. The precise reason why Data.Sequence.foldr slows down as the sequence is getting bigger is still unknown.

Your code

seqFoldr :: S.Seq Int -> Int
seqFoldr = F.foldr (+) 0

has non-linear performance depending on the length of the sequence. Take a look at this benchmark:

./seq-customized +RTS -s -A128M

[Length] [Performance of function seqFoldr]
25000:   mean: 1.096352 ms, lb 1.083301 ms, ub 1.121152 ms, ci 0.950
50000:   mean: 2.542133 ms, lb 2.514076 ms, ub 2.583209 ms, ci 0.950
100000:  mean: 6.068437 ms, lb 5.951889 ms, ub 6.237442 ms, ci 0.950
200000:  mean: 14.41332 ms, lb 13.95552 ms, ub 15.21217 ms, ci 0.950

Using the line with 25000 as a base gives us the following table:

[Length] [Performance of function seqFoldr]
1x:      mean: 1.00 = 1*1.00
2x:      mean: 2.32 = 2*1.16
4x:      mean: 5.54 = 4*1.39
8x:      mean: 13.15 = 8*1.64

In the above table, the non-linearity is demonstrated by the series 1.00, 1.16, 1.39, 1.64.

See also http://haskell.org/haskellwiki/Performance#Data.Sequence_vs._lists

Assuming the initial length of Seq xs is 100000 and n is 32, your code

seqSplit n xs =
let (a, b) = S.splitAt (S.length xs `div` n) xs
    sa = seqFoldr a
    sb = seqSplit (n-1) b
in  sa + sb

will be passing somewhat shorter Seqs to the function seqFoldr. The successive lengths of the Seqs passed from the above code to function seqFoldr look like:

(length xs)/n = (length a)
--------------------------
100000/32 = 3125
(100000-3125)/31 = 3125
(100000-2*3125)/30 = 3125
...
(100000-30*3125)/2 = 3125

Based on the first part of my answer (where we saw that the performance was non-linear), [32 calls to seqFoldr with Seq of length 3125] will execute faster than [1 call to seqFoldr with a single Seq of length 32*3125=100000].

Thus, the answer to your question is: Because foldr on Data.Sequence is slower as the sequence is getting larger.

Try use foldr' instead of foldr. I bet it is by lazy behavior of foldr which leads to allocate thunk for each data in sequence and evaluate on the end.

Edit:

So using foldr' halves taken time in mine case but still slower even then foldl'. Which means there is some complexity issue in Data.Sequence.fold* implementation.

benchmarking seqFoldr
collecting 100 samples, 1 iterations each, in estimated 2.516484 s
bootstrapping with 100000 resamples
mean: 24.93222 ms, lb 24.72772 ms, ub 25.15255 ms, ci 0.950
std dev: 1.081204 ms, lb 938.4503 us, ub 1.332666 ms, ci 0.950
found 1 outliers among 100 samples (1.0%)
variance introduced by outliers: 0.999%
variance is unaffected by outliers

benchmarking seqFoldr'
collecting 100 samples, 1 iterations each, in estimated 902.7004 ms
bootstrapping with 100000 resamples
mean: 11.05375 ms, lb 10.68481 ms, ub 11.42519 ms, ci 0.950
std dev: 1.895777 ms, lb 1.685334 ms, ub 2.410870 ms, ci 0.950
found 1 outliers among 100 samples (1.0%)
variance introduced by outliers: 1.000%
variance is unaffected by outliers

benchmarking seqFoldl'
collecting 100 samples, 1 iterations each, in estimated 862.4077 ms
bootstrapping with 100000 resamples
mean: 10.35651 ms, lb 9.947395 ms, ub 10.73637 ms, ci 0.950
std dev: 2.011693 ms, lb 1.875869 ms, ub 2.131425 ms, ci 0.950
variance introduced by outliers: 1.000%
variance is unaffected by outliers

Why is it faster to sum a Data.Sequence by divide-and-conquer, even with no parallelism?

Tags:

performance

haskell

Edit:

Recent Activity

Donate For Us

Why is it faster to sum a Data.Sequence by divide-and-conquer, even with no parallelism?

Tags:

performance

haskell

Edit:

Related questions

Recent Activity

Donate For Us