Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conduit pipeline has strange execution times

I'm testing the performance of this recursive Haskell function, which repeatedly sums the first 100000000 integers of an infinite list (using a Conduit pipeline) and prints the elapsed time of each execution:

import Conduit
import Data.Time.Clock

evaluate_listC 0 = return ()
evaluate_listC i = do
    startTime <- getCurrentTime
    print $ runConduitPure $ yieldMany [1..] .| takeC 100000000 .| sumC
    endTime <- getCurrentTime
    print $ diffUTCTime endTime startTime
    evaluate_listC (i-1)

Compiling (with -O flag) and running the code, and iterating the function 10 times, I obtain the following execution times:

38.2066878s
4.3696857s
1.3367605s
0.9950032s
0.9399968s
0.9039936s
0.9079987s
0.9119587s
0.9090151s
0.8749654s

Why does the first iteration (and also the second) take more time while the following ones are incredibly faster?

like image 866
mennetorelli Avatar asked Nov 06 '22 16:11

mennetorelli


1 Answers

As I mentioned in my comment, I can't duplicate these slow performance numbers, but I'm pretty sure I know what's going on. If you provide some additional detail that allows me to duplicate the issue, I can update the answer.

Most likely, the list [1..] (or perhaps some larger expression involving this list) is being "lifted" as a constant applicative form (CAF) to top level. As the list is generated during that first iteration, it's kept around as a "permanent" heap object for future iterations.

The first iteration takes a long time in part because it's allocating and generating the list, though because of GHC's "bump allocator", allocation is extremely fast, and actually generating the list probably only takes a few seconds. Most of the time is likely spent garbage collecting. GC time scales with the size of the "important" stuff that needs to be rescued (copied) from the bump allocator, and you're building a big, permanent list here.

Later iterations are much faster because they can run the Conduit summation over the existing list. This probably involves some allocation for intermediate results, but most of them don't stick around, so there's much less to GC and the iterations are fast.

The reason the second and third iterations are a little slower than later iterations has to do with GHC's generational garbage collector. Initially, both the permanent big list and other semi-permanent (e.g., needed only for a short time or for the current iteration) heap objects are copied out of the bump allocator. Further garbage collection involves re-copying the same permanent list while allowing expired semi-permanent objects to be collected. Eventually, the list gets promoted to the next generation, while all the non-list objects stay in the first generation.

Once the permanent list and semi-permanent "other objects" have been fully separated into different generations, the list no longer needs to be recopied during GC of the first generation, and iteration times settle down to around a second.

like image 110
K. A. Buhr Avatar answered Nov 13 '22 16:11

K. A. Buhr