Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I speed up my Haskell program (to the level of Python)

Tags:

haskell

I have the following toy program which cyclic shifts a vector and adds it to itself (under a mod). It does that for different shifts and high number of iterations (compared to the size of the vector). Program works, but its dog slow. I am still learning Haskell, so my question is: am I doing something wrong?

import Data.List (foldl')
import qualified Data.Sequence as Seq
import Data.Sequence (index, zipWith, Seq, (><), (<|), (|>))

seqSize = 100
numShifts = 10000 

cycleShift :: Integer -> Seq a -> Seq a
cycleShift s l = Seq.drop (fromInteger s) l >< Seq.take (fromInteger s) l

modAdd :: Seq Integer -> Seq Integer -> Seq Integer 
modAdd s t = Seq.zipWith (\ a b -> (a + b) `mod` 10^16) s t

step :: Seq Integer -> Integer -> Seq Integer
step l shift = modAdd l (cycleShift shift l)

allshifts = [i `mod` seqSize |i <- [1..numShifts]]
start = Seq.fromList (1 : [0 | i <- [1..(seqSize - 1)]])
end = foldl' step start allshifts

main :: IO ()
main = print (Seq.index end 0)

The same program in Python

seq_size = 100 
num_shifts = 10000

S = [i % seq_size for i in xrange(1, num_shifts + 1)]
ssums = [1] + [0 for i in range(seq_size - 1)]

for s in S: 
    shift = ssums[s:] + ssums[:s]  
    ssums = [(ssums[i] + shift[i]) % 10**16 for i in range(seq_size)]  

print ssums[0]

Here are the timings. Haskell: real 0m5.596s Python: real 0m0.551s

Python is not known for it's speed and yet is x10 times faster ?!?

like image 799
Rado Avatar asked Aug 09 '11 22:08

Rado


2 Answers

How are you running it?

I get 1.6 seconds for the Haskell version. (Compiled with ghc.exe -O2 seq.hs.)

Also, is there a reason you're using Seq? If I change it to use lists, I get 0.3 seconds execution time.

Here it is with lists:

import Data.List (foldl')

seqSize = 100
numShifts = 10000 

cycleShift s l = drop (fromInteger s) l ++ take (fromInteger s) l

modAdd s t = zipWith (\ a b -> (a + b) `mod` 10^16) s t

step l shift = modAdd l (cycleShift shift l)

allshifts = [i `mod` seqSize |i <- [1..numShifts]]
start = (1 : [0 | i <- [1..(seqSize - 1)]])
end = foldl' step start allshifts

main :: IO ()
main = print (end !! 0)
like image 121
porges Avatar answered Sep 28 '22 09:09

porges


  1. Use plain lists. They are heavily optimized. Using Data.Vector is even faster.
  2. Use rem instead of mod
  3. Avoid unnecessary work. (see cycleShift. Before, you splitted the list twice)
  4. Use Int instead of Integer if your calculation may not exceed the bounds. The former is a hardware int, while the later is arbitrary precision, but emulated via software.

Result: 3.6 secs to 0.5 secs. More is probably possible.

Code:

import Data.List (foldl')
import Data.Tuple

seqSize, numShifts :: Int
seqSize = 100

numShifts = 10000 

cycleShift :: Int -> [a] -> [a]
cycleShift s = uncurry (++) . swap . splitAt s

modAdd :: [Int] -> [Int] -> [Int]
modAdd = zipWith (\ a b -> (a + b) `rem` 10^16)

step :: [Int] -> Int -> [Int]
step l shift = modAdd l (cycleShift shift l)

allshifts = map (`rem` seqSize) [1..numShifts]
start = 1 : replicate (seqSize - 1) 0
end = foldl' step start allshifts

main :: IO ()
main = print (head end)

Edit

It gets even faster by using Data.Vector. I get around 0.4 sec on my machine using this code:

import Data.List (foldl')
import Data.Tuple

import Data.Vector (Vector)
import qualified Data.Vector as V

seqSize, numShifts :: Int
seqSize = 100

numShifts = 10000 

cycleShift :: Int -> Vector a -> Vector a
cycleShift s = uncurry (V.++) . swap . V.splitAt s

modAdd :: Vector Int -> Vector Int -> Vector Int
modAdd = V.zipWith (\ a b -> (a + b) `rem` 10^16)

step :: Vector Int -> Int -> Vector Int
step l shift = modAdd l (cycleShift shift l)

allshifts = map (`rem` seqSize) [1..numShifts]
start = 1 `V.cons` V.replicate (seqSize - 1) 0
end = foldl' step start allshifts

main :: IO ()
main = print (V.head end)

Edit 2

Using Data.Vector.Unboxed (Just change the imports and fix up the signatures), the runtime drops down to 0.074 secs. But the results are only correct, if an Int has 64 bit. It may also be that fast using Int64 though.

like image 26
fuz Avatar answered Sep 28 '22 09:09

fuz