After few hours of debugging, I realized that a very simple toy example was not efficient due to a missing !
in an expression return $ 1 + x
(thanks duplode!... but how come ghc does not optimize that??). I also realized it because I was comparing it with a Python code that was quicker, but I won't always write Python code to benchmark my code...
So here is my question: is there a way to automatically detect these "lazy memory leaks", that slow down a program for no real reason? I'm still pretty bad to optimize Haskell code, and forgetting a !
is quite likely, even when you're experienced I guess.
I'm aware of:
+RTS -s
, but I'm not sure how to interpret it: seeing 79MB
of memory for a simple program seems huge to me for example, but maybe it's not as it's what takes my current program... and for bigger programs it's impossible to just detect "lazy leaks" that way I guess as I have no idea of the amount of memory my program should take.cabal v2-run --enable-profiling mysatsolvers -- +RTS -p
command, but it seems that enabling the profiler kills some optimisations done by GHC, and therefore it's hard to use these values for a real benchmark. And still, it's not clear to me how to find leaks from that output anyway.Could you for example explain to me how I could find the "lazy leaks" in a toy program like this one?
{-# LANGUAGE DerivingVia, FlexibleInstances, ScopedTypeVariables #-} module Main where --- It depends on the transformers, containers, and base packages. --- Optimisation seems to be important or the NoLog case will be way to long. --- $ ghc -O Main.hs import qualified Data.Map.Strict as MapStrict import Data.Functor.Identity import qualified Control.Monad as CM import qualified Control.Monad.State.Strict as State import qualified Data.Time as Time -- Create a class that allows me to use the function "myTell" -- that adds a number in the writer (either the LogEntry -- or StupidLogEntry one) class Monad m => LogFunctionCalls m where myTell :: String -> Int -> m () ---------- Logging disabled ---------- --- (No logging at all gives the same time so I don't put here) newtype NoLog a = NoLog { unNoLog :: a } deriving (Functor, Applicative, Monad) via Identity instance LogFunctionCalls NoLog where myTell _ _ = pure () ---------- Logging with Map ---------- -- When logging, associate a number to each name. newtype LogEntryMap = LogEntryMap (MapStrict.Map String Int) deriving (Eq, Show) instance LogFunctionCalls (State.State LogEntryMap) where myTell namefunction n = State.modify' $ \(LogEntryMap m) -> LogEntryMap $ MapStrict.insertWith (+) namefunction n m ---------- Logging with Int ---------- -- Don't use any Map to avoid inefficiency of Map newtype LogEntryInt = LogEntryInt Int deriving (Eq, Show) instance LogFunctionCalls (State.State LogEntryInt) where myTell namefunction n = State.modify' $ \(LogEntryInt m) -> LogEntryInt $! m + n ---------- Function to compute ---------- countNumberCalls :: (LogFunctionCalls m) => Int -> m Int countNumberCalls 0 = return 0 countNumberCalls n = do myTell "countNumberCalls" 1 x <- countNumberCalls $! n - 1 return $ 1 + x main :: IO () main = do let www = 15000000 putStrLn $ "Let's start!" --- Logging disabled t0 <- Time.getCurrentTime let n = unNoLog $ countNumberCalls www putStrLn $ "Logging disabled: " ++ (show n) t1 <- Time.getCurrentTime print (Time.diffUTCTime t1 t0) -- Logging with Map let (n, LogEntryMap log) = State.runState (countNumberCalls www) (LogEntryMap MapStrict.empty) putStrLn $ "Logging with Map: " ++ (show n) putStrLn $ (show $ log) t2 <- Time.getCurrentTime print (Time.diffUTCTime t2 t1) -- Logging with Int let (n, LogEntryInt log) = State.runState (countNumberCalls www) (LogEntryInt 0) putStrLn $ "Logging with Int: " ++ (show n) putStrLn $ (show $ log) t3 <- Time.getCurrentTime print (Time.diffUTCTime t3 t2)
You can find memory leaks in Windows by looking at your PC's RAM usage or using the Performance Monitor. Check which program is eating up your computer by following these steps: Press CTRL + Shift + Esc key together to launch Task Manager. Now in the Memory tab, you can see the memory hogging processes.
A space leak occurs when there exists a point in the computer program where it uses more memory than necessary. Hence, a space leak causes the program to use more space than one would expect. Our primary language of study would be Haskell.
The main method for detecting memory leaks is heap profiling. Specifically, you're looking for unexpected growth in the amount of resident (mostly heap) memory, either the maximum residency in the +RTS -s
statistics output, or -- more reliably -- a characteristic "pyramid" shape over time in heap profile output generated with the +RTS -h<x>
flags and the hp2ps
tool.
If I run your toy program with +RTS -s
, I see:
3,281,896,520 bytes allocated in the heap 3,383,195,568 bytes copied during GC 599,346,304 bytes maximum residency (17 sample(s)) 5,706,584 bytes maximum slop 571 MB total memory in use (0 MB lost due to fragmentation)
The first line can generally be ignored. Haskell programs typically allocate a roughly constant amount of memory per second of runtime, and this allocation rate is either nearly zero (for certain, unusual programs), or 0.5-2.0 gigabytes per second. This program ran for 4 seconds and allocated 3.8 gigabytes, and that's not unusual.
The bytes copied during GC and maximum residency are concerning, though. Assuming you have a program that you expect to run in constant space (i.e., there's no ever-growing data structure whose entire contents are needed), a correctly functioning Haskell program will generally not need to copy much data during garbage collection and will tend to have a maximum residency that's a small fraction of the total bytes allocated (e.g., 100 kilobytes rather than half a gigabyte), and this won't grow substantially with the number of "iterations" of whatever it is you're testing.
You can generate a quick heap profile over time without turning on formal profiling. If you compile with the GHC flag -rtsopts
, you can use:
./Toy +RTS -hT
and then display the result graphically using the hp2ps
tool:
hp2ps -c -e8in Toy.hp evince Toy.ps &
This sort of pyramid pattern is a red flag:
Note that rapid linear increase in heap to the tune of hundreds of megabytes per second followed by a rapid linear collapse. This is the pattern you see when a huge lazy data structure is being needlessly built up before the entire computation is forced all at once. You see two pyramids here because both your second and third tests are exhibiting memory leaks.
As an aside, the x-axis is in "MUT seconds" (seconds the "mutator" is running, which excludes garbage collection), so that's why this is less than the actual 4 second runtime. That's actually another red flag. A Haskell program that's spending half its time garbage collecting probably isn't running correctly.
To get more detail on what's causing this heap pyramid, you'll need to compile with profiling enabled. Profiling may cause a program to run somewhat slower but doesn't normally change which optimizations are in place. However, the flag -fprof-auto
(and related flags) which automatically insert cost centers have the potential of causing big performance changes (by interfering with inlining, etc.). Unfortunately, the cabal --enable-profiling
flag turns on profiling (compiler flag -prof
) and the flag -fprof-auto-top
which automatically generates cost centers for top-level functions, so for your toy example, that substantially changes the behavior of your first test case (increasing the runtime from 0.4 seconds to 5 seconds, even with no +RTS
flags). That may be the problem you're seeing with profiling affecting your results. You don't need any cost centers for several additional kinds of heap profiles, so you can add the cabal flag --profiling-detail=none
to shut that off, and then your profiled program should run with timing a little slower but generally similar performance to the unprofiled version.
I don't use Cabal, but compiling with the following (which should be the equivalent of --enable-profiling --profiling-detail=none
):
ghc -O2 -rtsopts -prof Toy.hs # no -fprof-auto...
I can run your program with profiling by data type:
./Toy +RTS -hy
If I look at the heap profile graph:
this attributes most of the heap to the Int
type -- this narrows my problem down to a bunch of unevaluated lazy Int
calculations, which might point me in the right direction.
If I'm really having trouble narrowing things down and am feeling like a technical deep-dive, I can also run a heap profile by closure (flag -hd
). This tells me that the culprits are Main.sat_s7mQ
and Main.sat_s7kP
for the two pyramids respectively. This looks very mysterious, but they're the names of functions in the "STG", a low-level intermediate representation of my program generated by the compiler.
If I recompile with the same flags but add -fforce-recomp -ddump-stg -dsuppress-all
:
ghc -O2 -rtsopts -prof -fforce-recomp -ddump-stg -dsuppress-all Toy.hs
this will dump the STG that contains the definitions of these two functions. (The generated identifiers can differ with small changes to code and/or compiler flags, so it's best to recompile with the STG dumped and then re-profile that executable, to make sure the identifiers match.)
If I search the STG for the first culprit, I find the definition:
sat_s7mQ = CCCS \u [] case ww2_s7mL of { I# y_s7mO -> case +# [1# y_s7mO] of sat_s7mP { __DEFAULT -> I# [sat_s7mP]; }; };
Yes, this is all very technical, but this is STG-speak for the expression 1 + y
, which would help me zero in on the culprit.
If you don't speak STG, you can try introducing some cost centers. For example, I tried profiling only your second test case with -fprof-auto
(Cabal flag --profiling-detail=all-functions
). The profile output in Toy.prof
isn't that useful for memory leaks because it deals with total allocation instead of active (i.e., resident and not garbage collected) allocations over time, but you can create a heap profile by cost center by running:
./Toy +RTS -hc
In this case, it attributes everything to a single cost center, namely (315)countNumberCalls
. The "315" is the cost center number which you can look up in the Toy.prof
input to find the exact source code lines, if it's not clear from the name. Anyway, this at least helps narrow down the problem to countNumberCalls
.
For more complicated functions, you can sometimes narrow down the problem further by manually specifying cost centers, like so:
countNumberCalls :: (LogFunctionCalls m) => Int -> m Int countNumberCalls 0 = return 0 countNumberCalls n = do {-# SCC "mytell_call" #-} myTell "countNumberCalls" 1 x <- {-# SCC "recursive_call" #-} countNumberCalls $! n - 1 {-# SCC "return_statment" #-} return $ {-# SCC "one_plus_x" #-} 1 + x
This actually attributes everything to "recursive_call", so it's not that helpful.
It's not wrong, though. You actually have two memory leaks here -- the x <- countNumberCalls $! n - 1
leaks heap because x
isn't forced, and the 1 + x
leaks stack. You could enable the BangPatterns
extension and write:
!x <- countNumebrCalls $1 n - 1
and that would actually remove one of the memory leaks, speeding up the second case from 2.5 seconds to 1.0 seconds and dropping the maximum residency from 460 megs to 95 megs (and the bytes copied during GC from 1.5 Gigs to 73 kilobytes!). However, a heap profile would show linear growing stack accounting for pretty much all of that resident memory. Because stack isn't as well-tracked as heap, that would be more difficult to track down.
Some additional notes:
Even though the +RTS -h<x>
flags are primarily for heap profiling (and are discussed as "heap profiling" options in the GHC documentation), they can technically report on other uses of resident memory besides heap including per-thread state, which includes thread state objects and stack. By default, when running a profiled binary (compiled with -prof
), the +RTS -h<x>
flags do not report on per-thread state including stack, but you can add the -xt
flag to add it, as in +RTS -hc -xt
. Due to a probable unintentional oversight, on a non-profiled binary, the +RTS -hT
flag (the only -h<x>
flag available) includes stack even without the -xt
flag. Due to a compiler bug, the -hT
flag doesn't work on profiled binaries for GHC 8.6.x and earlier, but it does work on GHC 8.8.x, and for that version, +RTS -hT
includes stack on non-profiled binaries but excludes it on profiled binaries unless you also specify -xt
. That's why in the examples above, "Stack" only shows up when running a heap profile on a non-profiled binary. You can add the -xt
flag to see it for all the other heap profiles. Note that this "STACK" is actual stack use, rather than objects on the heap that are some how affiliated with the stack.
Black holes are primarily a mechanism for supporting concurrency. When a thread starts evaluating a thunk, it "blackholes" it (i.e., marks it as a black hole), so that if another thread comes along and wants to evaluate the same thunk, it waits on the evaluation instead of trying to re-evaluate it in parallel (which would duplicate the effort of the running thread). It's also used in the non-threaded runtime, partly because it can detect infinite loops (if a thread encounters its own black hole), but also for some more important reasons that I can't remember. For -hT
, -hd
, and -hy
heap profiling, heap objects that have been blackholed like this will be marked as "BLACKHOLE". The limited sampling rate in the profiles above can make it a little unclear, but what's happening in your program is that a large series of Int
thunks are being built in a chain, and when the value is finally forced, they are turned into a long chain of BLACKHOLE
s, each of which represents a computation that's been initiated and is waiting on the next computation in the chain.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With