Why is this Haskell program so much slower than an equivalent Python one?

Tags:

As part of a programming challenge, I need to read, from stdin, a sequence of space-separated integers (on a single line), and print the sum of those integers to stdout. The sequence in question can contain as many as 10,000,000 integers.

I have two solutions for this: one written in Haskell (foo.hs), and another, equivalent one, written in Python 2 (foo.py). Unfortunately, the (compiled) Haskell program is consistently slower than the Python program, and I'm at a loss for explaining the discrepancy in performance between the two programs; see the Benchmark section below. If anything, I would have expected Haskell to have the upper hand...

What am I doing wrong? How can I account for this discrepancy? Is there an easy way of speeding up my Haskell code?

(For information, I'm using a mid-2010 Macbook Pro with 8Gb RAM, GHC 7.8.4, and Python 2.7.9.)

`foo.hs`

main = print . sum =<< getIntList  getIntList :: IO [Int] getIntList = fmap (map read . words) getLine

(compiled with ghc -O2 foo.hs)

`foo.py`

ns = map(int, raw_input().split()) print sum(ns)

Benchmark

In the following, test.txt consists of a single line of 10 million space-separated integers.

# Haskell $ time ./foo < test.txt  1679257  real    0m36.704s user    0m35.932s sys     0m0.632s  # Python $ time python foo.py < test.txt 1679257   real    0m7.916s user    0m7.756s sys     0m0.151s

678

asked Mar 21 '15 18:03

jub0bs

1 Answers

read is slow. For bulk parsing, use bytestring or text primitives, or attoparsec.

I did some benchmarking. Your original version ran in 23,9 secs on my computer. The version below ran in 0.35 secs:

import qualified Data.ByteString.Char8 as B import Control.Applicative import Data.Maybe import Data.List import Data.Char  main = print . sum =<< getIntList  getIntList :: IO [Int] getIntList =     map (fst . fromJust . B.readInt) . B.words <$> B.readFile "test.txt"

By specializing the parser to your test.txt file, I could get the runtime down to 0.26 sec:

getIntList :: IO [Int]           getIntList =     unfoldr (B.readInt . B.dropWhile (==' ')) <$> B.readFile "test.txt"

122

answered Oct 14 '22 07:10

András Kovács

Related questions
                            
                                How to find out what methods, properties, etc a python module possesses
                            
                                Filtering os.walk() dirs and files
                            
                                Python 3 Building an array of bytes
                            
                                Can I make STATICFILES_DIR same as STATIC_ROOT in Django 1.3?
                            
                                Overwrite previous output in jupyter notebook
                            
                                Is it possible to store an array in Django model?
                            
                                django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet. (django 2.0.1)(Python 3.6)
                            
                                Python 2.6 JSON decoding performance
                            
                                nginx.service: Failed to read PID from file /run/nginx.pid: Invalid argument
                            
                                How to import .py file from another directory? [duplicate]
                            
                                pandas Series to Dataframe using Series indexes as columns
                            
                                df.unique() on whole DataFrame based on a column
                            
                                Ignore case in glob() on Linux
                            
                                Save matplotlib file to a directory
                            
                                TypeError: 'filter' object is not subscriptable
                            
                                Why does it say that module pygame has no init member?
                            
                                Erase whole array Python
                            
                                Can't create pdf using python PDFKIT Error : " No wkhtmltopdf executable found:"
                            
                                Taking the floor of a float
                            
                                Rename unnamed column pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is this Haskell program so much slower than an equivalent Python one?

Tags:

performance

python

io

haskell