Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find the allocation in a Haskell function compiled with GHC?

I'm using GHC 7.4 to compile the following function:

nodups' :: [Int] -> Bool
nodups' = ok empty
  where ok _ [] = True
        ok seen (n:ns) = not (n `member` seen) && ok (n `insert` seen) ns
        member n word = testBit word n
        insert n word = setBit word n
        empty = 0 :: Int

The function looks for duplicate elements in a list of small integers. The set seen is a representation of a set of small integers as a bit vector. The profiler (run with ghc -prof -auto-all) claims that the ok function accounts for 22% of allocation overall. Looking at the output with -ddump-simpl, I can't understand why this code is allocating. I checked, and as far as I can tell it is not allocating a thunk for the call to insert.

What should I look at to identify the part of my code that is allocating?

like image 873
Norman Ramsey Avatar asked Nov 16 '12 02:11

Norman Ramsey


1 Answers

Generally

I know of simple (scientific) implementations of functional languages, and if I remember correctly there is the G-Machine that may be used with Haskell.

This means (again, if I remember correctly) that your program state is represented like a "Tree", where the nodes are (for the sake of simplicity here) the functions you use in your code. The leafes would be the arguments to it. The "G-Maschine" then looks along the "Spine" (the left-side chain of nodes) and looks in the set of available "Functions" ("Supercombinators"?) for a pattern-match that it can apply. If a mattern-match is recognized from the left side of a definition it is then replaced by the right side of the definition.

This means that even a simple line like

ok seen (n:ns) = not (n `member` seen) && ok (n `insert` seen) ns

or even

(n:ns) = ns

is doing something in computer memory, i.e. matching the pattern

       ...
     ...
    (:)
   /   \
  n     ns

and replacing it with

       ...
     ...
    ns

The final result might consume less memory then the input, but this is a dynamic step and therefore must take place somewhere. If this is repeated over and over again (in a "tight loop") then this will make you CPU busy, as well it will your memory -- just because the G-Machine is operating. (As I said, I am not sure the G-Machine-concept applies here, but I guess it is something similar).

Specific guesses

    member n word = testBit word n
    insert n word = setBit word n

Besides that I habe some suspicions. testBit and setBit look like index operations on lists. If they are it could take some work. If they are proper arrays it would be ok. If they are a sort of maps or sets... well... there might be costly hashing involved? Or implemented via a balanced tree, which uses lots of (costly?) comparision operations?

like image 140
towi Avatar answered Oct 28 '22 11:10

towi