Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strictness optimization and memory allocation in Haskell

I've been learning some Haskell by implementing a feature selection algorithm.

I've gotten the performance from 20s on a benchmark dataset down to 5s, where the C program handles the same dataset in 0.5s. The dataset can be found here. To run, call the compiled binary like so: ./Mrmr 10 test_nci9_s3.csv.

The code is here, and I'm interested in optimizing mutualInfoInnerLoop:

mutualInfoInnerLoop :: Double -> Data.Vector.Unboxed.Vector (Int, Int) -> Double -> (Int, Int, Double) -> Double
mutualInfoInnerLoop n xys !acc (!i, !j, !px_py)
    | n == 0 || px_py == 0 || pxy == 0 = acc
    | otherwise                        = pxy * logBase 2 ( pxy / px_py ) + acc
    where
        pxy = ( fromIntegral . U.foldl' accumEq2 0 $ xys ) / n
        accumEq2 :: Int -> (Int, Int) -> Int
        accumEq2 !acc (!i', !j')
            | i' == i && j' == j = acc + 1
            | otherwise          = acc

The profiler says:

COST CENTRE                    MODULE               %time %alloc

mutualInfoInnerLoop            Main                  75.0   47.9
mutualInfo                     Main                  14.7   32.1
parseCsv                       Main                   5.9   13.1
CAF                            GHC.Float              1.5    0.0
readInt                        Main                   1.5    1.2
doMrmr                         Main                   1.5    4.0

Which shows mutualInfoInnerLoop as making 50% of the allocations, with 75% of the runtime in the program. The allocations are disconcerting.

Also, the Core for that function has a signature:

mutualInfoInnerLoop_rXG
  :: GHC.Types.Double
     -> Data.Vector.Unboxed.Base.Vector (GHC.Types.Int, GHC.Types.Int)
     -> GHC.Types.Double
     -> (GHC.Types.Int, GHC.Types.Int, GHC.Types.Double)
     -> GHC.Types.Double
[GblId,
 Arity=4,
 Caf=NoCafRefs,
 Str=DmdType U(L)LU(L)U(U(L)U(L)U(L))m]

Showing most of the parameters as being Lazily evaluated and boxed (as opposed to strict and unboxed).

I've tried BangPatterns, I've tried MagicHash, and I can't seem to make it go faster.

Anyone have any suggestions?

like image 607
LanceH Avatar asked Dec 11 '11 17:12

LanceH


People also ask

What is strictness in Haskell?

Function argumentsA function is considered strict in one of its arguments if, when the function is applied to a bottom value for that argument, the result is bottom. As we saw way above, + for Int is strict in both of its arguments, since: undefined + x is bottom, and x + undefined is bottom.

Are Haskell functions strict?

Haskell is a non-strict language, and most implementations use a strategy called laziness to run your program.

How does Haskell manage memory?

Haskell computations produce a lot of memory garbage - much more than conventional imperative languages. It's because data are immutable so the only way to store every next operation's result is to create new values. In particular, every iteration of a recursive computation creates a new value.

Does Haskell use garbage collection?

The Haskell runtime system employs a generational garbage collector (GC) with two generations 2. Generations are numbered starting with the youngest generation at zero.


1 Answers

I'm by far no expert at this, but I do see one small improvement. In your source I see this:

mutualInfo n ... = foldl' (mutualInfoInnerLoop n $ U.zip xs ys) ...

You don't need to check n == 0 every single time the function is invoked, since you never change the n argument when you invoke it. The xys argument doesn't change, either, meaning that pxy does not change across invocations, since it depends solely on xys and n. Let's take advantage of these things to make sure a closure is created which evaluates these things only once.

mutualInfoInnerLoop n xys
  | n == 0 || pxy == 0 = const
  | otherwise          = go
  where pxy = (fromIntegral . U.foldl' accumEq2 0 $ xys) / n
        accumEq2 :: Int -> (Int, Int) -> Int
        accumEq2 !acc (!i', !j')
              | i' == i && j' == j = acc + 1
              | otherwise          = acc
        go !acc (!i, !j, !px_py)
          | px_py == 0 = acc
          | otherwise  = pxy * logBase 2 ( pxy / px_py ) + acc

I'm not sure if GHC is smart enough to perform this optimization on its own, nor am I sure this saves much time/space, but it's the best I've got. With those bang patterns sprinkled all over, I wonder if this is a case of too much strictness.

like image 133
Dan Burton Avatar answered Sep 20 '22 19:09

Dan Burton