Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Runhaskell performance anomaly

I'm trying to understand a performance anomaly being observed when running a program under runhaskell.

The program in question is:

isFactor n = (0 ==) . (mod n)
factors x = filter (isFactor x) [2..x]
main = putStrLn $ show $ sum $ factors 10000000

When I run this, it takes 1.18 seconds.

However, if I redefine isFactor as:

isFactor n f = (0 ==) (mod n f)

then the program takes 17.7 seconds.

That's a huge difference in performance and I would expect the programs to be equivalent. Does anybody know what I'm missing here?

Note: This doesn't happen when compiled under GHC.

like image 681
Steve Avatar asked Feb 17 '12 08:02

Steve


2 Answers

Although the functions should be identical, there's a difference in how they're applied. With the first definition, isFactor is fully applied at the call site isFactor x. In the second definition, it isn't, because now isFactor explicitly takes two arguments.

Even minimal optimizations are enough for GHC to see through this and create identical code for both, however if you compile with -O0 -ddump-simpl you can determine that, with no optimizations, this makes a difference (at least with ghc-7.2.1, YMMV with other versions).

With the first isFactor, GHC creates a single function that is passed as the predicate to "GHC.List.Filter", with the calls to mod 10000000 and (==) inlined. For the second definition, what happens instead is that most of the calls within isFactor are let-bound references to class functions, and are not shared between multiple invocations of isFactor. So there's a lot of dictionary overhead that's completely unnecessary.

This is almost never an issue because even the default compiler settings will optimize it away, however runhaskell apparently doesn't even do that much. Even so, I have occasionally structured code as someFun x y = \z -> because I know that someFun will be partially applied and this was the only way to preserve sharing between calls (i.e. GHC's optimizer wasn't sufficiently clever).

like image 94
John L Avatar answered Nov 08 '22 04:11

John L


As I understand it, runhaskell does little to no optimisation. It's designed to quickly load and run code. If it did more optimisation, it would take longer for your code to start running. Of course, in this case, the code runs faster with optimisations.

As I understand it, if a compiled version of the code exists, then runhaskell will use it. So if performance matters to you, just make sure you compile with optimisations turned on first. (I think you might even be able to pass switches to runhaskell to turn on optimisations - you'd have to check the documentation...)

like image 24
MathematicalOrchid Avatar answered Nov 08 '22 06:11

MathematicalOrchid