Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this cause a memory leak in the Haskell Conduit library?

I have a conduit pipeline processing a long file. I want to print a progress report for the user every 1000 records, so I've written this:

-- | Every n records, perform the IO action.
-- Used for progress reports to the user.
progress :: (MonadIO m) => Int -> (Int -> i -> IO ()) -> Conduit i m i
progress n act = skipN n 1
   where
      skipN c t = do
         mv <- await
         case mv of
            Nothing -> return ()
            Just v ->
               if c <= 1
                  then do
                     liftIO $ act t v
                     yield v
                     skipN n (succ t)
                  else do
                     yield v
                     skipN (pred c) (succ t)

No matter what action I call this with, it leaks memory, even if I just tell it to print a full stop.

As far as I can see the function is tail recursive and both counters are regularly forced (I tried putting "seq c" and "seq t" in, to no avail). Any clue?

If I put in an "awaitForever" that prints a report for every record then it works fine.

Update 1: This occurs only when compiled with -O2. Profiling indicates that the leaking memory is allocated in the recursive "skipN" function and being retained by "SYSTEM" (whatever that means).

Update 2: I've managed to cure it, at least in the context of my current program. I've replaced the function above with this. Note that "proc" is of type "Int -> Int -> Maybe i -> m ()": to use it you call "await" and pass it the result. For some reason swapping over the "await" and "yield" solved the problem. So now it awaits the next input before yielding the previous result.

-- | Every n records, perform the monadic action. 
-- Used for progress reports to the user.
progress :: (MonadIO m) => Int -> (Int -> i -> IO ()) -> Conduit i m i
progress n act = await >>= proc 1 n
   where
      proc c t = seq c $ seq t $ maybe (return ()) $ \v ->
         if c <= 1
            then {-# SCC "progress.then" #-} do
               liftIO $ act t v
               v1 <- await
               yield v
               proc n (succ t) v1
            else {-# SCC "progress.else" #-} do
               v1 <- await
               yield v
               proc (pred c) (succ t) v1

So if you have a memory leak in a Conduit, try swapping the yield and await actions.

like image 297
Paul Johnson Avatar asked Jul 16 '14 16:07

Paul Johnson


People also ask

What is the main cause of memory leaks?

A memory leak starts when a program requests a chunk of memory from the operating system for itself and its data. As a program operates, it sometimes needs more memory and makes an additional request.

What is a space leak in Haskell?

A space leak occurs when there exists a point in the computer program where it uses more memory than necessary. Hence, a space leak causes the program to use more space than one would expect. Our primary language of study would be Haskell.

What is a memory leak and how do you prevent it?

Memory leak occurs when programmers create a memory in heap and forget to delete it. Memory leaks are particularly serious issues for programs like daemons and servers which by definition never terminate. To avoid memory leaks, memory allocated on heap should always be freed when no longer needed.

What are memory leaks?

A memory leak occurs when a process allocates memory from the paged or nonpaged pools, but does not free the memory. As a result, these limited pools of memory are depleted over time, causing Windows to slow down. If memory is completely depleted, failures may result.


2 Answers

This isn't an anwser but it is some complete code I hacked up for testing. I don't know conduit at all, so it may not be the best conduit code. I've forced everything that seems like it needs to be forced, but it still leaks.

{-# LANGUAGE BangPatterns #-}

import Data.Conduit
import Data.Conduit.List
import Control.Monad.IO.Class

-- | Every n records, perform the IO action.
--   Used for progress reports to the user.
progress :: (MonadIO m) => Int -> (Int -> i -> IO ()) -> Conduit i m i
progress n act = skipN n 1
   where
      skipN !c !t = do
         mv <- await
         case mv of
            Nothing -> return ()
            Just !v ->
               if (c :: Int) <= 1
                  then do
                     liftIO $ act t v
                     yield v
                     skipN n (succ t)
                  else do
                     yield v
                     skipN (pred c) (succ t)

main :: IO ()
main = unfold (\b -> b `seq` Just (b, b+1)) 1
       $= progress 100000 (\_ b -> print b)
       $$ fold (\_ _ -> ()) ()

On the other hand,

main = unfold (\b -> b `seq` Just (b, b+1)) 1 $$ fold (\_ _ -> ()) ()

does not leak, so something in progress does indeed seem to be the problem. I can't see what.

EDIT: The leak only occurs with ghci! If I compile a binary and run it there is no leak (I should have tested this earlier ...)

like image 88
Tom Ellis Avatar answered Nov 16 '22 00:11

Tom Ellis


I think Tom's answer is the right one, I'm starting this as a separate answer as it will likely introduce some new discussion (and because it's too long for just a comment). In my testing, replacing the print b in Tom's example with return () gets rid of the memory leak. This made me think that the problem is in fact with print, not conduit. To test this theory, I wrote a simple helper function in C (placed in helper.c):

#include <stdio.h>

void helper(int c)
{
    printf("%d\n", c);
}

Then I foreign imported this function in the Haskell code:

foreign import ccall "helper" helper :: Int -> IO ()

and I replaced the call to print with a call to helper. The output from the program is identical, but I show no leak, and a max residency of 32kb vs 62kb (I also modified the code to stop at 10m records for better comparison).

I see similar behavior when I cut out conduit entirely, e.g.:

main :: IO ()
main = forM_ [1..10000000] $ \i ->
    when (i `mod` 100000 == 0) (helper i)

I'm not convinced, however, that this is really a bug in print or Handle. My testing never showed the leak reaching any substantial memory usage, so it could just be that a buffer is growing towards a limit. I'd have to do more research to understand this better, but I wanted to first see if this analysis meshes with what others are seeing.

like image 34
Michael Snoyman Avatar answered Nov 16 '22 01:11

Michael Snoyman