Why does this cause a memory leak in the Haskell Conduit library?

Tags:

I have a conduit pipeline processing a long file. I want to print a progress report for the user every 1000 records, so I've written this:

-- | Every n records, perform the IO action.
-- Used for progress reports to the user.
progress :: (MonadIO m) => Int -> (Int -> i -> IO ()) -> Conduit i m i
progress n act = skipN n 1
   where
      skipN c t = do
         mv <- await
         case mv of
            Nothing -> return ()
            Just v ->
               if c <= 1
                  then do
                     liftIO $ act t v
                     yield v
                     skipN n (succ t)
                  else do
                     yield v
                     skipN (pred c) (succ t)

No matter what action I call this with, it leaks memory, even if I just tell it to print a full stop.

As far as I can see the function is tail recursive and both counters are regularly forced (I tried putting "seq c" and "seq t" in, to no avail). Any clue?

If I put in an "awaitForever" that prints a report for every record then it works fine.

Update 1: This occurs only when compiled with -O2. Profiling indicates that the leaking memory is allocated in the recursive "skipN" function and being retained by "SYSTEM" (whatever that means).

Update 2: I've managed to cure it, at least in the context of my current program. I've replaced the function above with this. Note that "proc" is of type "Int -> Int -> Maybe i -> m ()": to use it you call "await" and pass it the result. For some reason swapping over the "await" and "yield" solved the problem. So now it awaits the next input before yielding the previous result.

-- | Every n records, perform the monadic action. 
-- Used for progress reports to the user.
progress :: (MonadIO m) => Int -> (Int -> i -> IO ()) -> Conduit i m i
progress n act = await >>= proc 1 n
   where
      proc c t = seq c $ seq t $ maybe (return ()) $ \v ->
         if c <= 1
            then {-# SCC "progress.then" #-} do
               liftIO $ act t v
               v1 <- await
               yield v
               proc n (succ t) v1
            else {-# SCC "progress.else" #-} do
               v1 <- await
               yield v
               proc (pred c) (succ t) v1

So if you have a memory leak in a Conduit, try swapping the yield and await actions.

297

asked Jul 16 '14 16:07

Paul Johnson

2 Answers

This isn't an anwser but it is some complete code I hacked up for testing. I don't know conduit at all, so it may not be the best conduit code. I've forced everything that seems like it needs to be forced, but it still leaks.

{-# LANGUAGE BangPatterns #-}

import Data.Conduit
import Data.Conduit.List
import Control.Monad.IO.Class

-- | Every n records, perform the IO action.
--   Used for progress reports to the user.
progress :: (MonadIO m) => Int -> (Int -> i -> IO ()) -> Conduit i m i
progress n act = skipN n 1
   where
      skipN !c !t = do
         mv <- await
         case mv of
            Nothing -> return ()
            Just !v ->
               if (c :: Int) <= 1
                  then do
                     liftIO $ act t v
                     yield v
                     skipN n (succ t)
                  else do
                     yield v
                     skipN (pred c) (succ t)

main :: IO ()
main = unfold (\b -> b `seq` Just (b, b+1)) 1
       $= progress 100000 (\_ b -> print b)
       $$ fold (\_ _ -> ()) ()

On the other hand,

main = unfold (\b -> b `seq` Just (b, b+1)) 1 $$ fold (\_ _ -> ()) ()

does not leak, so something in progress does indeed seem to be the problem. I can't see what.

EDIT: The leak only occurs with ghci! If I compile a binary and run it there is no leak (I should have tested this earlier ...)

answered Nov 16 '22 00:11

Tom Ellis

I think Tom's answer is the right one, I'm starting this as a separate answer as it will likely introduce some new discussion (and because it's too long for just a comment). In my testing, replacing the print b in Tom's example with return () gets rid of the memory leak. This made me think that the problem is in fact with print, not conduit. To test this theory, I wrote a simple helper function in C (placed in helper.c):

#include <stdio.h>

void helper(int c)
{
    printf("%d\n", c);
}

Then I foreign imported this function in the Haskell code:

foreign import ccall "helper" helper :: Int -> IO ()

and I replaced the call to print with a call to helper. The output from the program is identical, but I show no leak, and a max residency of 32kb vs 62kb (I also modified the code to stop at 10m records for better comparison).

I see similar behavior when I cut out conduit entirely, e.g.:

main :: IO ()
main = forM_ [1..10000000] $ \i ->
    when (i `mod` 100000 == 0) (helper i)

I'm not convinced, however, that this is really a bug in print or Handle. My testing never showed the leak reaching any substantial memory usage, so it could just be that a buffer is growing towards a limit. I'd have to do more research to understand this better, but I wanted to first see if this analysis meshes with what others are seeing.

answered Nov 16 '22 01:11

Michael Snoyman

Related questions
                            
                                Why does `fmap sum Just` typecheck?
                            
                                Can I write a higher order type for a -> b -> *?
                            
                                IORef in Haskell
                            
                                What is the relationship between bind and join?
                            
                                Is it possible to have recursive sum type, with each 'level' having distinct value?
                            
                                Type roles and confusing behavior by `coerce`
                            
                                Use Haskell like Prelude modules in a module in raku
                            
                                Is IO a Free Monad?
                            
                                How do you structure a stateful module in Haskell?
                            
                                Show-ing functions used in QuickCheck properties
                            
                                What is the point of the strictness declaration?
                            
                                HUnit/QuickCheck with Continuous Integration
                            
                                Data constructor in template haskell
                            
                                Haskell function that takes a variadic function as an argument (and returns something else than that func) without FlexibleInstances, pure Haskell2010
                            
                                ViewPatterns and multiple calls in Haskell
                            
                                In haskell how can I uppercase a unicode character with respect to current locale
                            
                                Pipes and callbacks in Haskell
                            
                                What's the most efficient way to represent finite (non-recursive) algebraic type values?
                            
                                Reasoning about IORef operation reordering in concurrent programs
                            
                                How do people typically develop Haskell modules?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does this cause a memory leak in the Haskell Conduit library?

Tags:

memory-leaks

haskell

conduit

Paul Johnson

People also ask

2 Answers

Tom Ellis

Michael Snoyman

Recent Activity

Donate For Us