I've been learning some Haskell by implementing a feature selection algorithm.
I've gotten the performance from 20s on a benchmark dataset down to 5s, where the C program handles the same dataset in 0.5s. The dataset can be found here. To run, call the compiled binary like so: ./Mrmr 10 test_nci9_s3.csv
.
The code is here, and I'm interested in optimizing mutualInfoInnerLoop:
mutualInfoInnerLoop :: Double -> Data.Vector.Unboxed.Vector (Int, Int) -> Double -> (Int, Int, Double) -> Double
mutualInfoInnerLoop n xys !acc (!i, !j, !px_py)
| n == 0 || px_py == 0 || pxy == 0 = acc
| otherwise = pxy * logBase 2 ( pxy / px_py ) + acc
where
pxy = ( fromIntegral . U.foldl' accumEq2 0 $ xys ) / n
accumEq2 :: Int -> (Int, Int) -> Int
accumEq2 !acc (!i', !j')
| i' == i && j' == j = acc + 1
| otherwise = acc
The profiler says:
COST CENTRE MODULE %time %alloc
mutualInfoInnerLoop Main 75.0 47.9
mutualInfo Main 14.7 32.1
parseCsv Main 5.9 13.1
CAF GHC.Float 1.5 0.0
readInt Main 1.5 1.2
doMrmr Main 1.5 4.0
Which shows mutualInfoInnerLoop as making 50% of the allocations, with 75% of the runtime in the program. The allocations are disconcerting.
Also, the Core for that function has a signature:
mutualInfoInnerLoop_rXG
:: GHC.Types.Double
-> Data.Vector.Unboxed.Base.Vector (GHC.Types.Int, GHC.Types.Int)
-> GHC.Types.Double
-> (GHC.Types.Int, GHC.Types.Int, GHC.Types.Double)
-> GHC.Types.Double
[GblId,
Arity=4,
Caf=NoCafRefs,
Str=DmdType U(L)LU(L)U(U(L)U(L)U(L))m]
Showing most of the parameters as being Lazily evaluated and boxed (as opposed to strict and unboxed).
I've tried BangPatterns, I've tried MagicHash, and I can't seem to make it go faster.
Anyone have any suggestions?
Function argumentsA function is considered strict in one of its arguments if, when the function is applied to a bottom value for that argument, the result is bottom. As we saw way above, + for Int is strict in both of its arguments, since: undefined + x is bottom, and x + undefined is bottom.
Haskell is a non-strict language, and most implementations use a strategy called laziness to run your program.
Haskell computations produce a lot of memory garbage - much more than conventional imperative languages. It's because data are immutable so the only way to store every next operation's result is to create new values. In particular, every iteration of a recursive computation creates a new value.
The Haskell runtime system employs a generational garbage collector (GC) with two generations 2. Generations are numbered starting with the youngest generation at zero.
I'm by far no expert at this, but I do see one small improvement. In your source I see this:
mutualInfo n ... = foldl' (mutualInfoInnerLoop n $ U.zip xs ys) ...
You don't need to check n == 0
every single time the function is invoked, since you never change the n
argument when you invoke it. The xys
argument doesn't change, either, meaning that pxy
does not change across invocations, since it depends solely on xys
and n
. Let's take advantage of these things to make sure a closure is created which evaluates these things only once.
mutualInfoInnerLoop n xys
| n == 0 || pxy == 0 = const
| otherwise = go
where pxy = (fromIntegral . U.foldl' accumEq2 0 $ xys) / n
accumEq2 :: Int -> (Int, Int) -> Int
accumEq2 !acc (!i', !j')
| i' == i && j' == j = acc + 1
| otherwise = acc
go !acc (!i, !j, !px_py)
| px_py == 0 = acc
| otherwise = pxy * logBase 2 ( pxy / px_py ) + acc
I'm not sure if GHC is smart enough to perform this optimization on its own, nor am I sure this saves much time/space, but it's the best I've got. With those bang patterns sprinkled all over, I wonder if this is a case of too much strictness.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With