Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does haskell enumerator based IO call sigprocmask so often?

REVISED SUMMARY

Alright, it looks like the syscalls are certainly related to GC, and the underlying problem is just that GC is happening too often. This seems to be related to the use of splitWhen and pack, as best I can tell by profiling.

splitWhen's implementation converts each chunk from lazy to strict text, and concatenates them all, as it builds up a buffer of chunks. That's bound to allocate a lot.

pack, since it's converting from one type to another, has to allocate, and that's in my inner loop, so that makes sense too.

ORIGINAL ISSUE

I've stumbled on some surprising syscall activity in haskell enumerator based IO. Hoping someone can shed some light on it.

I've been toying with a haskell version of a quick perl script I once wrote for a few months now, on and off. The script reads in some json from each line, and then prints out a specific field, if it exists.

Here's the perl version, and how I'm running it.

cat ~/sample_input | perl -lpe '($_) = grep(/type/, split(/,/))' > /dev/null

Here's the haskell version (it is invoked similarly to the perl version).

{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Enumerator as E
import qualified Data.Enumerator.Internal as EI
import qualified Data.Enumerator.Text as ET
import qualified Data.Enumerator.List as EL
import qualified Data.Text as T
import qualified Data.Text.IO as TI
import Data.Functor
import Control.Monad
import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.IO as TLI
import System.Environment
import System.IO (stdin, stdout)
import GHC.IO.Handle (hSetBuffering, BufferMode(BlockBuffering))

fieldEnumerator field = enumStdin E.$= splitOn [',','\n'] E.$= grabField field

enumStdin = ET.enumHandle stdin

splitOn :: [Char] -> EI.Enumeratee T.Text T.Text IO b
splitOn chars = (ET.splitWhen (`elem` chars))

grabField :: String -> EI.Enumeratee T.Text T.Text IO b
grabField = EL.filter . T.isInfixOf . T.pack

intercalateNewlines = EL.mapM_ (\field -> (TI.putStrLn field >> (putStr "\n\n")))

runE enum = E.run_ $ enum E.$$ intercalateNewlines

main = do
  (field:_) <- getArgs
  runE $ fieldEnumerator field

The surprise is that the haskell version's trace looks something like this (the actual JSON is suppressed because it's data from work), whereas the perl version does what I'd expect; a bunch of reads followed by a write, repeated.

55333/0x8816f5:    366125       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    366136       3      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    367209       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    367218       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    368449       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    368458       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    369525       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    369534       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    370610       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    370620       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    371735       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    371744       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    371798       5      2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0)        = 1 0
55333/0x8816f5:    371802       3      1 read(0x0, SOME_JSON, 0x1FA0)      = 8096 0
55333/0x8816f5:    372907       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    372918       3      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    374063       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    374072       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    375147       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    375156       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    376283       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    376292       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    376809       6      2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0)        = 1 0
55333/0x8816f5:    376814       5      3 read(0x0, SOME_JSON, 0x1FA0)      = 8096 0
55333/0x8816f5:    377378       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    377387       3      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    378537       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    378546       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    379598       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    379604       3      0 sigreturn(0x7FFF5FBFF9A0, 0x1E, 0x1)        = 0 Err#-2
55333/0x8816f5:    379613       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    380667       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    380678       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    381862       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    381871       3      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    382032       6      2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0)        = 1 0
55333/0x8816f5:    382036       4      2 read(0x0, SOME_JSON, 0x1FA0)        = 8096 0
55333/0x8816f5:    383064       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    383073       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    384118       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    384127       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    385206       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    385215       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    386348       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    386358       3      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    387468       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    387477      11      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    387614       6      2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0)        = 1 0
55333/0x8816f5:    387620       5      3 read(0x0, SOME_JSON, 0x1FA0)        = 8096 0
55333/0x8816f5:    388597       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    388606       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    389707       3      0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC)      = 0x0 0
55333/0x8816f5:    389716       2      0 sigprocmask(0x3, 0x10069BFAC, 0x0)      = 0x0 0
55333/0x8816f5:    390261       7      3 select(0x2, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0)        = 1 0
55333/0x8816f5:    390273       6      3 write(0x1, SOME_OUTPUT, 0x1FA0)      = 8096 0
like image 750
heartpunk Avatar asked Apr 23 '12 01:04

heartpunk


4 Answers

Are you concerned about the allocations or the (overhead from?) calls to sigprocmask?

If it's the former and you want to use the enumerator package this small change helps out a 4k test set by about 50%: 8MB of allocations reduced to 4MB and gen0 GC's went from 15 to 6.

splitOn :: EI.Enumeratee T.Text T.Text IO b
splitOn = EL.concatMap (T.split fastSplit)

fastSplit :: Char -> Bool
fastSplit ','  = True
fastSplit '\n' = True
fastSplit _    = False

Before (stats from +RTS -sstderr -RTS):

       8,212,680 bytes allocated in the heap
         696,184 bytes copied during GC
         148,656 bytes maximum residency (1 sample(s))
          30,664 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0        15 colls,     0 par    0.00s    0.00s     0.0001s    0.0005s
  Gen  1         1 colls,     0 par    0.00s    0.00s     0.0010s    0.0010s

After:

       3,838,048 bytes allocated in the heap
         689,592 bytes copied during GC
         148,368 bytes maximum residency (1 sample(s))
          27,040 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0         6 colls,     0 par    0.00s    0.00s     0.0001s    0.0003s
  Gen  1         1 colls,     0 par    0.00s    0.00s     0.0006s    0.0006s

Which is a pretty reasonable improvement but definitely leaves something to be desired. Rather than kicking enumerator around too much more I took a stab at rewriting it in conduit-0.4.1 just for kicks. It should be equivalent...

import Data.Conduit as C
import qualified Data.Conduit.Binary as Cb
import qualified Data.Conduit.List as Cl
import qualified Data.Conduit.Text as Ct
import qualified Data.Text as T
import qualified Data.Text.IO as TI
import Control.Monad.Trans (MonadIO, liftIO)
import System.Environment
import System.IO (stdin)

grabField :: Monad m => String -> Conduit T.Text m T.Text
grabField = Cl.filter . T.isInfixOf . T.pack

printField :: MonadIO m => T.Text -> m ()
printField field = liftIO $ do
  TI.putStrLn field
  putStr "\n\n"

fastSplit :: Char -> Bool
fastSplit ','  = True
fastSplit '\n' = True
fastSplit _    = False

main :: IO ()
main = do
  field:_ <- getArgs
  runResourceT $ Cb.sourceHandle stdin
              $$ Ct.decode Ct.utf8
              =$ Cl.concatMap (T.split fastSplit)
              =$ grabField field
              =$ Cl.mapM_ printField

... but for some reason allocates and retains less memory:

         835,688 bytes allocated in the heap
           8,576 bytes copied during GC
          87,200 bytes maximum residency (1 sample(s))
          19,968 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0         1 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.00s    0.00s     0.0008s    0.0008s
like image 96
Nathan Howell Avatar answered Nov 09 '22 07:11

Nathan Howell


Promoting this to the top level from comments:

FWIW, I'm going through the runtime (we're also discussing this in IRC) and there are only two uses of sigprocmask: GC and the tty driver. The latter being unlikely, I've recommended profiling to verify that it's doing a lot of GC and to try to find out why.

And it turns out (from IRC) that it's doing 90MB of allocation for 0.5MB of data and the garbage collector is indeed being triggered quite a lot. So now it's down to why enumerator is doing so much extra allocation.

like image 25
geekosaur Avatar answered Nov 09 '22 05:11

geekosaur


If the amount of data read between those sigsetmasks is large, the first guess off the top of my head is that the runtime is doing the sigsetmask before the gc runs, so that gc isn't interrupted with the heap in an inconsistent state.

like image 34
none Avatar answered Nov 09 '22 06:11

none


More than a comment and less than an answer: if you grep through GHC source you'll see posix/TTY.c (TERMIOS code) and sm/GC.c (via {,un}blockUserSignals) have the most likely candidates. You could compile GHC with debugging symbols or just throw in some dummy (unique) system calls to ensure you can differentiate the two system call profiles to find out. Another cheap test would be to remove any terminal interactions and if the masking behavior disappears then that would be mild evidence supporting the GC (none's answer).

EDIT: I should acknowledge that some library code can call sigprocmask too, I ignored that as a less likely source, but it could actually be the issue!

like image 3
Thomas M. DuBuisson Avatar answered Nov 09 '22 07:11

Thomas M. DuBuisson