Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is haskell performing worse than java

I wanted to fool a bit around with random numbers, if the random generator in haskell is distributed uniformly or not, thus I wrote the following program after a few tries (with lists being generated leading to stack overflow).

module Main where

import System.Environment (getArgs)
import Control.Applicative ((<$>))
import System.Random (randomRIO)

main :: IO ()
main = do nn <- map read <$> getArgs :: IO [Int]
          let n = if null nn then 10000 else head nn
          m <- loop n 0 (randomRIO (0,1))
          putStrLn $ "True: "++show (m//n::Double) ++", False "++show ((n-m)//n :: Double)
          return ()

loop :: Int -> Int -> IO Double -> IO Int
loop n acc x | n<0       = return acc
             | otherwise = do x' <- round <$> x
                              let x'' = (x' + acc) in x'' `seq` loop (n-1) x'' x

(//) :: (Integral a, Fractional b) => a -> a -> b
x // y = fromIntegral x / fromIntegral y

as I got it to work ok-ish I decided to write another version - in java (which I am not very good in), and expected haskell to beat it, but the java program ran about half the time compared to the haskell version

import java.util.Random;

public class MonteCarlo {
    public static void main(String[] args) {
        int n = Integer.parseInt(args[0]);
        Random r = new Random();
        int acc = 0;
        for (int i=0; i<=n; i++) {
            acc += Math.round(r.nextDouble());
        }
        System.out.println("True: "+(double) acc/n+", False: "+(double)(n-acc)/n);
    }
}

I tried to look at the profile of the haskell version - which told me that most of the work was done in the loop - no wonder! I tried to look at the core, but I really don't know enough to understand that. I figure that the java version, may be using more than one core - as the system used more than 100%, when I timed it.

I guess one could improve the code using unboxed Doubles/Ints, but again my knowledge of hakell is not up to that.

like image 685
epsilonhalbe Avatar asked Jan 16 '14 00:01

epsilonhalbe


2 Answers

This is way too big to leave as a comment, but it's not really an answer, just some data. I took the main loop and tested it with criterion. Then a few variations.

import Control.Applicative
import System.Random

import Criterion.Main

import Data.List


iters :: Int
iters = 10000

loop1 :: IO Int
loop1 = go iters 0
  where
    go n acc | n < 0 = return acc
             | otherwise = do
                 x <- randomRIO (0, 1 :: Double)
                 let acc' = acc + round x
                 acc' `seq` go (n - 1) acc'

loop1' :: IO Int
loop1' = go iters 0 
  where
    go n acc | n < 0 = return acc
             | otherwise = do
                 x <- randomRIO (0, 1)
                 let acc' = acc + x
                 acc' `seq` go (n - 1) acc'

loop2 :: IO Int
loop2 = do
    g <- newStdGen
    let s = foldl' (+) 0 . take iters . map round $ randomRs (0, 1 :: Double) g
    s `seq` return s

loop2' :: IO Int
loop2' = do
    g <- newStdGen
    let s = foldl' (+) 0 . take iters $ randomRs (0, 1) g
    s `seq` return s

loop3 :: IO Int
loop3 = do
    g0 <- newStdGen
    let go n acc g | n < 0 = acc
                   | otherwise = let (x, g') = randomR (0, 1 :: Double) g
                                     acc' = acc + round x
                                 in acc' `seq` go (n - 1) acc g'
    return $! go iters 0 g0

loop3':: IO Int
loop3'= do
    g0 <- newStdGen
    let go n acc g | n < 0 = acc
                   | otherwise = let (x, g') = randomR (0, 1) g
                                     acc' = acc + x
                                 in acc' `seq` go (n - 1) acc g'
    return $! go iters 0 g0

main :: IO ()
main = defaultMain $
       [ bench "loop1" $ whnfIO loop1
       , bench "loop2" $ whnfIO loop2
       , bench "loop3" $ whnfIO loop3
       , bench "loop1'" $ whnfIO loop1'
       , bench "loop2'" $ whnfIO loop2'
       , bench "loop3'" $ whnfIO loop3'
       ]

And here are the timings: Except, well. I'm running on a virtualbox vm, and performance is kinda random. The overall trends were consistent, though.

carl@debian:~/hask$ ghc -Wall -O2 randspeed.hs
[1 of 1] Compiling Main             ( randspeed.hs, randspeed.o )
Linking randspeed ...
carl@debian:~/hask$ ./randspeed 
warming up
estimating clock resolution...
mean is 3.759461 us (160001 iterations)
found 3798 outliers among 159999 samples (2.4%)
  1210 (0.8%) low severe
  2492 (1.6%) high severe
estimating cost of a clock call...
mean is 2.152186 us (14 iterations)
found 3 outliers among 14 samples (21.4%)
  1 (7.1%) low mild
  1 (7.1%) high mild
  1 (7.1%) high severe

benchmarking loop1
mean: 15.88793 ms, lb 15.41649 ms, ub 16.37845 ms, ci 0.950
std dev: 2.472512 ms, lb 2.332036 ms, ub 2.650680 ms, ci 0.950
variance introduced by outliers: 90.466%
variance is severely inflated by outliers

benchmarking loop2
mean: 26.44217 ms, lb 26.28822 ms, ub 26.64457 ms, ci 0.950
std dev: 905.7558 us, lb 713.3236 us, ub 1.165090 ms, ci 0.950
found 8 outliers among 100 samples (8.0%)
  6 (6.0%) high mild
  2 (2.0%) high severe
variance introduced by outliers: 30.636%
variance is moderately inflated by outliers

benchmarking loop3
mean: 18.43004 ms, lb 18.29330 ms, ub 18.60769 ms, ci 0.950
std dev: 794.3779 us, lb 628.6630 us, ub 1.043238 ms, ci 0.950
found 5 outliers among 100 samples (5.0%)
  4 (4.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 40.516%
variance is moderately inflated by outliers

benchmarking loop1'
mean: 4.579197 ms, lb 4.494131 ms, ub 4.677335 ms, ci 0.950
std dev: 468.0648 us, lb 406.8328 us, ub 558.5602 us, ci 0.950
found 2 outliers among 100 samples (2.0%)
  2 (2.0%) high mild
variance introduced by outliers: 80.019%
variance is severely inflated by outliers

benchmarking loop2'
mean: 4.473382 ms, lb 4.386545 ms, ub 4.567254 ms, ci 0.950
std dev: 460.5377 us, lb 410.1520 us, ub 543.1835 us, ci 0.950
found 1 outliers among 100 samples (1.0%)
variance introduced by outliers: 80.033%
variance is severely inflated by outliers

benchmarking loop3'
mean: 3.577855 ms, lb 3.490043 ms, ub 3.697916 ms, ci 0.950
std dev: 522.4125 us, lb 416.7015 us, ub 755.3713 us, ci 0.950
found 5 outliers among 100 samples (5.0%)
  4 (4.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 89.406%
variance is severely inflated by outliers

Here are my conclusions:

  • Never benchmark anything in virtualbox if you want reliable timing.
  • randomRs is slow, due to list creation overhead. Too bad lists don't fuse with foldl', that'd make this faster and simpler.
  • There's very little overhead associated with doing the recursion in IO. It shows up the version that uses Int instead of double, but that's it.
  • The Random instance for Double is super-slow. Or else round is super-slow. Or maybe the combination of the two is.
like image 40
Carl Avatar answered Sep 19 '22 15:09

Carl


I have tried a crude version of your code relying on laziness:

module Main where

import System.Environment
import Control.Applicative
import System.Random

main :: IO ()
main = do
  args <- getArgs
  let n = if null args then 10000 else read $ head args
  g <- getStdGen
  let vals = randomRs (0, 1) g :: [Int]
  let s = sum $ take n vals
  putStrLn $ "True: " ++ f s n ++ ", False" ++ f (n - s) n

f x y = show $ ((fromIntegral x / fromIntegral y) :: Double)

For now, ignore the fact that I've missed some type declarations and I have imported everything from the modules. I just wanted to be free to test.

Back at the castle, your version was saved as original.hs while the above was saved as 1.hs. Testing time:

[mihai@esgaroth so]$ ghc --make -O2 original.hs
[1 of 1] Compiling Main             ( original.hs, original.o )
Linking original ...
[mihai@esgaroth so]$ ghc --make -O2 1.hs 
[1 of 1] Compiling Main             ( 1.hs, 1.o )
Linking 1 ...
[mihai@esgaroth so]$ time ./original 
True: 0.4981, False 0.5019

real    0m0.022s
user    0m0.021s
sys     0m0.000s
[mihai@esgaroth so]$ time ./1 
True: 0.4934, False0.5066

real    0m0.005s
user    0m0.003s
sys     0m0.001s
[mihai@esgaroth so]$ time ./original 
True: 0.5063, False 0.4937

real    0m0.018s
user    0m0.017s
sys     0m0.001s
[mihai@esgaroth so]$ time ./1 
True: 0.5024, False0.4976

real    0m0.005s
user    0m0.003s
sys     0m0.002s

Everytime, the new code was 4 time faster. And that is while still having the first version of using lazy constructs and already existing code.

Next step is to test the performance heaps and to see if it is worth to embed the sum computation when generating the random list.

PS: On my machine:

[mihai@esgaroth so]$ time java MonteCarlo 10000
True: 0.5011, False: 0.4989

real    0m0.063s
user    0m0.066s
sys     0m0.010s

PPS: Running the code compiled without -O2:

[mihai@esgaroth so]$ time ./original 
True: 0.5035, False 0.4965

real    0m0.032s
user    0m0.031s
sys     0m0.001s
[mihai@esgaroth so]$ time ./1 
True: 0.4975, False0.5025

real    0m0.014s
user    0m0.010s
sys     0m0.003s

Only a 2 time reduction but still faster than java.

like image 64
Mihai Maruseac Avatar answered Sep 18 '22 15:09

Mihai Maruseac