I was trying to solve ITA Software's "Word Nubmers" puzzle using a brute force approach. It looks like my Haskell version is more than 10 times slower than a C#/C++ version.
Thanks to Bryan O'Sullivan's answer, I was able to "correct" my program to acceptable performance. You can read his code which is much cleaner than mine. I am going to outline the key points here.
Int
is Int64
on Linux GHC x64. Unless you unsafeCoerce
, you should just use Int
. This saves you from having to fromIntegral
. Doing Int64
on Windows 32-bit GHC is just darn slow, avoid it. (This is in fact not GHC's fault. As mentioned in my blog post below, 64 bit integers in 32-bit programs is slow in general (at least in Windows))-fllvm
or -fvia-C
for performance.quotRem
to divMod
, quotRem
already suffices. That gave me 20% speed up.Data.Vector
to Data.Array
as an "array"The above points were enough to give me about 100% boost over my original version.
In my blog post, I have detailed a step-by-step illustrated example of how I turned the original program to match Bryan's program. There are other points mentioned there as well.
(This may sound like a "could you do the work for me" post, but I argue that such a concrete example would be very instructive since profiling Haskell performance is often seen as a myth)
(As noted in the comments, I think I have misinterpreted the problem. But who cares, we can focus on performance in a different problem)
Here's a my version of a quick recap of the problem:
A wordNumber is defined as
wordNumber 1 = "one"
wordNumber 2 = "onetwo"
wordNumber 3 = "onethree"
wordNumber 15 = "onetwothreefourfivesixseveneightnineteneleventwelvethirteenfourteenfifteen"
...
Problem: Find the 51-billion-th letter of (wordNumber Infinity); assume that letter is found at 'wordNumber x', also find 'sum [1..x]'
From an imperative perspective, a naive algorithm would be to have 2 counters, one for sum of numbers and one for sum of lengths. Keep counting the length of each wordNumber and "break" to return the result.
The imperative brute-force approach is implemented in C# here: http://ideone.com/JjCb3. It takes about 1.5 minutes to find the answer on my computer. There is also an C++ implementation that runs in 45 seconds on my computer.
Then I implemented a brute-force Haskell version: http://ideone.com/ngfFq. It cannot finish the calculation in 5 minutes on my machine. (Irony: it's has more lines than the C# version)
Here is the -p
profile of the Haskell program: http://hpaste.org/49934
(Note: I am fully aware that brute-forcing it is not the correct solution to this problem. I am mainly interested in making the Haskell version perform comparatively to the C# version. Right now it is at least 5x slower so obviously I am missing something obvious)
(Note 2: It does not seem to be space leaking. The program runs with constant memory (about 2MB) on my computer)
(Note 3: I am compiling with `ghc -O2 WordNumber.hs)
To make the question more reader friendly, I include the "gist" of the two versions.
// C#
long sumNum = 0;
long sumLen = 0;
long target = 51000000000;
long i = 1;
for (; i < 999999999; i++)
{
// WordiLength(1) = 3 "one"
// WordiLength(101) = 13 "onehundredone"
long newLength = sumLen + WordiLength(i);
if (newLength >= target)
break;
sumNum += i;
sumLen = newLength;
}
Console.WriteLine(Wordify(i)[Convert.ToInt32(target - sumLen - 1)]);
-
-- Haskell
-- This has become totally ugly during my squeeze for
-- performance
-- Tail recursive
-- n-th number (51000000000 in our problem) -> accumulated result -> list of 'zipped' left to try
-- accumulated has the format (sum of numbers, current lengths of the whole chain, the current number)
solve :: Int64 -> (Int64, Int64, Int64) -> [(Int64, Int64)] -> (Int64, Int64, Int64)
solve !n !acc@(!sumNum, !sumLen, !curr) ((!num, !len):xs)
| sumLen' >= n = (sumNum', sumLen, num)
| otherwise = solve n (sumNum', sumLen', num) xs
where
sumNum' = sumNum + num
sumLen' = sumLen + len
-- wordLength 1 = 3 "one"
-- wordLength 101 = 13 "onehundredone"
wordLength :: Int64 -> Int64
-- wordLength = ...
solution :: Int64 -> (Int64, Char)
solution !x =
let (sumNum, sumLen, n) = solve x (0,0,1) (map (\n -> (n, wordLength n)) [1..])
in (sumNum, (wordify n) !! (fromIntegral $ x - sumLen - 1))
I've written a gist that contains both a C++ version (a copy of yours from a Haskell-cafe message, with a bug fixed) and a Haskell translation.
Notice that the two are structurally almost identical. When compiled with -fllvm
, the Haskell code runs at about half the speed of the C++ code, which is pretty good.
Now let's compare my Haskell wordLength
code to yours. You're passing around an extra unnecessary parameter, which is unnecessary (you apparently figured that out when writing the C++ code that I translated). Also, the large number of bang patterns suggests panic; they're almost all useless.
Your solve
function is also very confused.
You're passing parameters in three different ways: a regular Int
, a 3-tuple, and a list! Whoa.
This function is necessarily not very regular in its behaviour, so while you gain nothing stylistically by using a list to supply your counter, you probably force GHC to allocate memory. In other words, this both obfuscates the code and makes it slower.
By using a tuple for three parameters (for no obvious reason), you're again working hard to force GHC to allocate memory for every step through the loop, when it could avoid doing so if you passed the parameters directly.
Only your n
parameter is dealt with in a sensible way, but you don't need a bang pattern on it.
The only parameter that needs a bang pattern is sumNum
, because you never inspect its value until after the loop has finished. GHC's strictness analyser will deal with the others. All of your other bang patterns are unnecessary at best, misdirections at worst.
Here are two pointers I could come up with in a quick investigation:
Note that using Int64
is really slow when you are using a 32 bit build of GHC, as is the default for Haskell Platform, currently. This also turned out to be the main villain in a previous performance problem (there I give a few more details).
For reasons I don't quite understand the divMod
function does not seem to get inlined. As a result, the numbers are returned on the heap. When using div
and mod
separately, wordLength'
executes purely on the stack as it should be.
Sadly I currently have no 64-bit GHC around to test whether this is enough to solve the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With