Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this simple Haskell program so slow? [duplicate]

In this trivial program to print all the numbers from 1 to 10000000, a Haskell version and a C version, why is the Haskell one so slow and what commands help to learn how to improve the Haskell program's performance?

Below is a report containing all the details necessary to reproduce my exciting event, the sources are printed when making the report including the source of the Makefile:

$ make -B report
cat Foo.hs
import Data.Foldable
main = traverse_ print [1..10000000]
cat Fooc.c
#include <stdio.h>

int main()
{
    for (int n = 0; n < 10000000; ++n)
    {
        printf("%d\n", n+1);
    }
}
ghc -O3 Foo.hs -o Foo
time ./Foo | tail -n1
3.45user 0.03system 0:03.49elapsed 99%CPU (0avgtext+0avgdata 4092maxresident)k
0inputs+0outputs (0major+290minor)pagefaults 0swaps
10000000
cc -O3    Fooc.c   -o Fooc
time ./Fooc | tail -n1
0.63user 0.02system 0:00.66elapsed 99%CPU (0avgtext+0avgdata 1468maxresident)k
0inputs+0outputs (0major+63minor)pagefaults 0swaps
10000000
cat Makefile
.PHONY: printFoo printFooc printMakefile
printFoo: Foo.hs
    cat $^

printFooc: Fooc.c
    cat $^

printMakefile: Makefile
    cat $^

Fooc: CFLAGS=-O3
Fooc: Fooc.c
Foo: Foo.hs
    ghc -O3 $^ -o $@

.PHONY: timeFoo timeFooc
timeFoo: Foo
    time ./$^ | tail -n1
timeFooc: Fooc
    time ./$^ | tail -n1

.PHONY: report
report: printFoo printFooc timeFoo timeFooc printMakefile
like image 507
codeshot Avatar asked Mar 13 '26 02:03

codeshot


1 Answers

On my system your Haskell code takes about 3.2 seconds.

N.B. your C code takes...

time ./fooc | tail -n1
ld: warning: directory not found for option '-L/opt/local/lib'
10000000
./fooc  0.92s user 0.03s system 33% cpu 2.863 total
tail -n1  2.85s user 0.01s system 99% cpu 2.865 total

So do note the difference of time a | b and what that means vs time (a | b).

Haskell is slow in part because (some of this is hypothesis)

  1. By default print and the underlying putStrLn use String which is a linked list of characters.
  2. UTF encoding
  3. RTS differences

For 1, the packed variant using Text doesn't perform much differently, perhaps due to issue 2.

For 2, the ByteString variant (packed bytes instead of characters) is more representative of what your C program is doing:

-- Using functions from the Relude package
main = traverse_ putBSLn (show <$> [(1::Int)..10000000])

Results in

10000000
./foo  1.55s user 0.08s system 56% cpu 2.904 total

So the CPU time is closer to your C program, leading me to hypothesize that this difference is largely about the unnecessary UTF8 handling built into the routines you use by default in Haskell's prelude.

Dead-ends:

  • I tried NoBuffering and large BlockBuffering with no luck.
  • Constructing a large bytestring and printing with a single call wasn't any better (lazy or strict bytestrings).
  • Printing via Text instead of String gave only the barest improvement.
  • Rendering directly to ByteString instead of by packing values showed into Strings. This could be a win, I expect, if done well.

EDIT: I can't believe I forgot about Builder, which is an optimized way to build bytestrings and, in some cases, it fuses well to reduce allocations. Builder is the underpinning of the above example I already showed but using it directly allows some manual optimization.

{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString.Builder
import System.IO (stdout)
import Data.Foldable

main :: IO ()
main = do
    traverse_ (hPutBuilder stdout . (<>"\n") . intDec) [(1::Int)..10000000]

Performing at:

./foo  1.05s user 0.13s system 38% cpu 3.048 total
tail -n1  3.02s user 0.01s system 99% cpu 3.047 total

And indeed this is more efficient than the prior, many separate, calls to hPut because as hPutBuilder says:

This function is more efficient than hPut . toLazyByteString because in many cases no buffer allocation has to be done. Moreover, the results of several executions of short Builders are concatenated in the Handles buffer, therefore avoiding unnecessary buffer flushes.

So I should add: 4. Haskell was slow in this case because sometimes computations do not fuse and you end up with excess allocation, which isn't free.

like image 165
Thomas M. DuBuisson Avatar answered Mar 15 '26 15:03

Thomas M. DuBuisson