Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python faster than compiled Haskell?

I have a simple script written in both Python and Haskell. It reads a file with 1,000,000 newline separated integers, parses that file into a list of integers, quick sorts it and then writes it to a different file sorted. This file has the same format as the unsorted one. Simple.

Here is Haskell:

quicksort :: Ord a => [a] -> [a] quicksort []     = [] quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)     where         lesser  = filter (< p) xs         greater = filter (>= p) xs  main = do     file <- readFile "data"     let un = lines file     let f = map (\x -> read x::Int ) un     let done = quicksort f     writeFile "sorted" (unlines (map show done)) 

And here is Python:

def qs(ar):     if len(ar) == 0:         return ar      p = ar[0]     return qs([i for i in ar if i < p]) + [p] + qs([i for i in ar if i > p])   def read_file(fn):     f = open(fn)     data = f.read()     f.close()     return data  def write_file(fn, data):     f = open('sorted', 'w')     f.write(data)     f.close()   def main():     data = read_file('data')      lines = data.split('\n')     lines = [int(l) for l in lines]      done = qs(lines)     done = [str(l) for l in done]      write_file('sorted', "\n".join(done))  if __name__ == '__main__':     main() 

Very simple. Now I compile the Haskell code with

$ ghc -O2 --make quick.hs 

And I time those two with:

$ time ./quick $ time python qs.py 

Results:

Haskell:

real    0m10.820s user    0m10.656s sys 0m0.154s 

Python:

real    0m9.888s user    0m9.669s sys 0m0.203s 

How can Python possibly be faster than native code Haskell?

Thanks

EDIT:

  • Python version: 2.7.1
  • GHC version: 7.0.4
  • Mac OSX, 10.7.3
  • 2.4GHz Intel Core i5

List generated by

from random import shuffle a = [str(a) for a in xrange(0, 1000*1000)] shuffle(a) s = "\n".join(a) f = open('data', 'w') f.write(s) f.close() 

So all numbers are unique.

like image 986
Honza Pokorny Avatar asked Apr 27 '12 20:04

Honza Pokorny


2 Answers

The Original Haskell Code

There are two issues with the Haskell version:

  • You're using string IO, which builds linked lists of characters
  • You're using a non-quicksort that looks like quicksort.

This program takes 18.7 seconds to run on my Intel Core2 2.5 GHz laptop. (GHC 7.4 using -O2)

Daniel's ByteString Version

This is much improved, but notice it still uses the inefficient built-in merge sort.

His version takes 8.1 seconds (and doesn't handle negative numbers, but that's more of a non-issue for this exploration).

Note

From here on this answer uses the following packages: Vector, attoparsec, text and vector-algorithms. Also notice that kindall's version using timsort takes 2.8 seconds on my machine (edit: and 2 seconds using pypy).

A Text Version

I ripped off Daniel's version, translated it to Text (so it handles various encodings) and added better sorting using a mutable Vector in an ST monad:

import Data.Attoparsec.Text.Lazy import qualified Data.Text.Lazy as T import qualified Data.Text.Lazy.IO as TIO import qualified Data.Vector.Unboxed as V import qualified Data.Vector.Algorithms.Intro as I import Control.Applicative import Control.Monad.ST import System.Environment (getArgs)  parser = many (decimal <* char '\n')  main = do     numbers <- TIO.readFile =<< fmap head getArgs     case parse parser numbers of         Done t r | T.null t -> writeFile "sorted" . unlines                                                   . map show . vsort $ r         x -> error $ Prelude.take 40 (show x)  vsort :: [Int] -> [Int] vsort l = runST $ do         let v = V.fromList l         m <- V.unsafeThaw v         I.sort m         v' <- V.unsafeFreeze m         return (V.toList v') 

This runs in 4 seconds (and also doesn't handle negatives)

Return to the Bytestring

So now we know we can make a more general program that's faster, what about making the ASCii -only version fast? No problem!

import qualified Data.ByteString.Lazy.Char8 as BS import Data.Attoparsec.ByteString.Lazy (parse,  Result(..)) import Data.Attoparsec.ByteString.Char8 (decimal, char) import Control.Applicative ((<*), many) import qualified Data.Vector.Unboxed as V import qualified Data.Vector.Algorithms.Intro as I import Control.Monad.ST   parser = many (decimal <* char '\n')  main = do     numbers <- BS.readFile "rands"     case parse parser numbers of         Done t r | BS.null t -> writeFile "sorted" . unlines                                                    . map show . vsort $ r  vsort :: [Int] -> [Int] vsort l = runST $ do         let v = V.fromList l         m <- V.unsafeThaw v         I.sort m         v' <- V.unsafeFreeze m         return (V.toList v') 

This runs in 2.3 seconds.

Producing a Test File

Just in case anyone's curious, my test file was produced by:

import Control.Monad.CryptoRandom import Crypto.Random main = do   g <- newGenIO :: IO SystemRandom   let rs = Prelude.take (2^20) (map abs (crandoms g) :: [Int])   writeFile "rands" (unlines $ map show rs) 

If you're wondering why vsort isn't packaged in some easier form on Hackage... so am I.

like image 115
Thomas M. DuBuisson Avatar answered Sep 28 '22 16:09

Thomas M. DuBuisson


In short, don't use read. Replace read with a function like this:

import Numeric  fastRead :: String -> Int fastRead s = case readDec s of [(n, "")] -> n 

I get a pretty fair speedup:

~/programming% time ./test.slow ./test.slow  9.82s user 0.06s system 99% cpu 9.901 total ~/programming% time ./test.fast ./test.fast  6.99s user 0.05s system 99% cpu 7.064 total ~/programming% time ./test.bytestring ./test.bytestring  4.94s user 0.06s system 99% cpu 5.026 total 

Just for fun, the above results include a version that uses ByteString (and hence fails the "ready for the 21st century" test by totally ignoring the problem of file encodings) for ULTIMATE BARE-METAL SPEED. It also has a few other differences; for example, it ships out to the standard library's sort function. The full code is below.

import qualified Data.ByteString as BS import Data.Attoparsec.ByteString.Char8 import Control.Applicative import Data.List  parser = many (decimal <* char '\n')  reallyParse p bs = case parse p bs of     Partial f -> f BS.empty     v -> v  main = do     numbers <- BS.readFile "data"     case reallyParse parser numbers of         Done t r | BS.null t -> writeFile "sorted" . unlines . map show . sort $ r 
like image 24
Daniel Wagner Avatar answered Sep 28 '22 16:09

Daniel Wagner