Note: This answer doesn't actually answer the question. It only restates the question in a different way. The precise reason why Data.Sequence.foldr
slows down as the sequence is getting bigger is still unknown.
Your code
seqFoldr :: S.Seq Int -> Int
seqFoldr = F.foldr (+) 0
has non-linear performance depending on the length of the sequence. Take a look at this benchmark:
./seq-customized +RTS -s -A128M
[Length] [Performance of function seqFoldr]
25000: mean: 1.096352 ms, lb 1.083301 ms, ub 1.121152 ms, ci 0.950
50000: mean: 2.542133 ms, lb 2.514076 ms, ub 2.583209 ms, ci 0.950
100000: mean: 6.068437 ms, lb 5.951889 ms, ub 6.237442 ms, ci 0.950
200000: mean: 14.41332 ms, lb 13.95552 ms, ub 15.21217 ms, ci 0.950
Using the line with 25000 as a base gives us the following table:
[Length] [Performance of function seqFoldr]
1x: mean: 1.00 = 1*1.00
2x: mean: 2.32 = 2*1.16
4x: mean: 5.54 = 4*1.39
8x: mean: 13.15 = 8*1.64
In the above table, the non-linearity is demonstrated by the series 1.00, 1.16, 1.39, 1.64.
See also http://haskell.org/haskellwiki/Performance#Data.Sequence_vs._lists
Assuming the initial length of Seq xs
is 100000 and n
is 32, your code
seqSplit n xs =
let (a, b) = S.splitAt (S.length xs `div` n) xs
sa = seqFoldr a
sb = seqSplit (n-1) b
in sa + sb
will be passing somewhat shorter Seqs to the function seqFoldr
. The successive lengths of the Seqs passed from the above code to function seqFoldr
look like:
(length xs)/n = (length a)
--------------------------
100000/32 = 3125
(100000-3125)/31 = 3125
(100000-2*3125)/30 = 3125
...
(100000-30*3125)/2 = 3125
Based on the first part of my answer (where we saw that the performance was non-linear), [32 calls to seqFoldr
with Seq of length 3125] will execute faster than [1 call to seqFoldr
with a single Seq of length 32*3125=100000].
Thus, the answer to your question is: Because foldr
on Data.Sequence
is slower as the sequence is getting larger.
Try use foldr'
instead of foldr
. I bet it is by lazy behavior of foldr
which leads to allocate thunk for each data in sequence and evaluate on the end.
So using foldr'
halves taken time in mine case but still slower even then foldl'
. Which means there is some complexity issue in Data.Sequence.fold*
implementation.
benchmarking seqFoldr
collecting 100 samples, 1 iterations each, in estimated 2.516484 s
bootstrapping with 100000 resamples
mean: 24.93222 ms, lb 24.72772 ms, ub 25.15255 ms, ci 0.950
std dev: 1.081204 ms, lb 938.4503 us, ub 1.332666 ms, ci 0.950
found 1 outliers among 100 samples (1.0%)
variance introduced by outliers: 0.999%
variance is unaffected by outliers
benchmarking seqFoldr'
collecting 100 samples, 1 iterations each, in estimated 902.7004 ms
bootstrapping with 100000 resamples
mean: 11.05375 ms, lb 10.68481 ms, ub 11.42519 ms, ci 0.950
std dev: 1.895777 ms, lb 1.685334 ms, ub 2.410870 ms, ci 0.950
found 1 outliers among 100 samples (1.0%)
variance introduced by outliers: 1.000%
variance is unaffected by outliers
benchmarking seqFoldl'
collecting 100 samples, 1 iterations each, in estimated 862.4077 ms
bootstrapping with 100000 resamples
mean: 10.35651 ms, lb 9.947395 ms, ub 10.73637 ms, ci 0.950
std dev: 2.011693 ms, lb 1.875869 ms, ub 2.131425 ms, ci 0.950
variance introduced by outliers: 1.000%
variance is unaffected by outliers
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With