I have a simple script written in both Python and Haskell. It reads a file with 1,000,000 newline separated integers, parses that file into a list of integers, quick sorts it and then writes it to a different file sorted. This file has the same format as the unsorted one. Simple.
Here is Haskell:
quicksort :: Ord a => [a] -> [a] quicksort [] = [] quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater) where lesser = filter (< p) xs greater = filter (>= p) xs main = do file <- readFile "data" let un = lines file let f = map (\x -> read x::Int ) un let done = quicksort f writeFile "sorted" (unlines (map show done))
And here is Python:
def qs(ar): if len(ar) == 0: return ar p = ar[0] return qs([i for i in ar if i < p]) + [p] + qs([i for i in ar if i > p]) def read_file(fn): f = open(fn) data = f.read() f.close() return data def write_file(fn, data): f = open('sorted', 'w') f.write(data) f.close() def main(): data = read_file('data') lines = data.split('\n') lines = [int(l) for l in lines] done = qs(lines) done = [str(l) for l in done] write_file('sorted', "\n".join(done)) if __name__ == '__main__': main()
Very simple. Now I compile the Haskell code with
$ ghc -O2 --make quick.hs
And I time those two with:
$ time ./quick $ time python qs.py
Results:
Haskell:
real 0m10.820s user 0m10.656s sys 0m0.154s
Python:
real 0m9.888s user 0m9.669s sys 0m0.203s
How can Python possibly be faster than native code Haskell?
Thanks
EDIT:
List generated by
from random import shuffle a = [str(a) for a in xrange(0, 1000*1000)] shuffle(a) s = "\n".join(a) f = open('data', 'w') f.write(s) f.close()
So all numbers are unique.
The Original Haskell Code
There are two issues with the Haskell version:
This program takes 18.7 seconds to run on my Intel Core2 2.5 GHz laptop. (GHC 7.4 using -O2)
Daniel's ByteString Version
This is much improved, but notice it still uses the inefficient built-in merge sort.
His version takes 8.1 seconds (and doesn't handle negative numbers, but that's more of a non-issue for this exploration).
Note
From here on this answer uses the following packages: Vector
, attoparsec
, text
and vector-algorithms
. Also notice that kindall's version using timsort takes 2.8 seconds on my machine (edit: and 2 seconds using pypy).
A Text Version
I ripped off Daniel's version, translated it to Text (so it handles various encodings) and added better sorting using a mutable Vector
in an ST monad:
import Data.Attoparsec.Text.Lazy import qualified Data.Text.Lazy as T import qualified Data.Text.Lazy.IO as TIO import qualified Data.Vector.Unboxed as V import qualified Data.Vector.Algorithms.Intro as I import Control.Applicative import Control.Monad.ST import System.Environment (getArgs) parser = many (decimal <* char '\n') main = do numbers <- TIO.readFile =<< fmap head getArgs case parse parser numbers of Done t r | T.null t -> writeFile "sorted" . unlines . map show . vsort $ r x -> error $ Prelude.take 40 (show x) vsort :: [Int] -> [Int] vsort l = runST $ do let v = V.fromList l m <- V.unsafeThaw v I.sort m v' <- V.unsafeFreeze m return (V.toList v')
This runs in 4 seconds (and also doesn't handle negatives)
Return to the Bytestring
So now we know we can make a more general program that's faster, what about making the ASCii -only version fast? No problem!
import qualified Data.ByteString.Lazy.Char8 as BS import Data.Attoparsec.ByteString.Lazy (parse, Result(..)) import Data.Attoparsec.ByteString.Char8 (decimal, char) import Control.Applicative ((<*), many) import qualified Data.Vector.Unboxed as V import qualified Data.Vector.Algorithms.Intro as I import Control.Monad.ST parser = many (decimal <* char '\n') main = do numbers <- BS.readFile "rands" case parse parser numbers of Done t r | BS.null t -> writeFile "sorted" . unlines . map show . vsort $ r vsort :: [Int] -> [Int] vsort l = runST $ do let v = V.fromList l m <- V.unsafeThaw v I.sort m v' <- V.unsafeFreeze m return (V.toList v')
This runs in 2.3 seconds.
Producing a Test File
Just in case anyone's curious, my test file was produced by:
import Control.Monad.CryptoRandom import Crypto.Random main = do g <- newGenIO :: IO SystemRandom let rs = Prelude.take (2^20) (map abs (crandoms g) :: [Int]) writeFile "rands" (unlines $ map show rs)
If you're wondering why vsort
isn't packaged in some easier form on Hackage... so am I.
In short, don't use read
. Replace read
with a function like this:
import Numeric fastRead :: String -> Int fastRead s = case readDec s of [(n, "")] -> n
I get a pretty fair speedup:
~/programming% time ./test.slow ./test.slow 9.82s user 0.06s system 99% cpu 9.901 total ~/programming% time ./test.fast ./test.fast 6.99s user 0.05s system 99% cpu 7.064 total ~/programming% time ./test.bytestring ./test.bytestring 4.94s user 0.06s system 99% cpu 5.026 total
Just for fun, the above results include a version that uses ByteString
(and hence fails the "ready for the 21st century" test by totally ignoring the problem of file encodings) for ULTIMATE BARE-METAL SPEED. It also has a few other differences; for example, it ships out to the standard library's sort function. The full code is below.
import qualified Data.ByteString as BS import Data.Attoparsec.ByteString.Char8 import Control.Applicative import Data.List parser = many (decimal <* char '\n') reallyParse p bs = case parse p bs of Partial f -> f BS.empty v -> v main = do numbers <- BS.readFile "data" case reallyParse parser numbers of Done t r | BS.null t -> writeFile "sorted" . unlines . map show . sort $ r
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With