Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do hGetBuf, hPutBuf, etc. allocate memory?

In the process of doing some simple benchmarking, I came across something that surprised me. Take this snippet from Network.Socket.Splice:

hSplice :: Int -> Handle -> Handle -> IO ()
hSplice len s t = do
  a <- mallocBytes len :: IO (Ptr Word8)
  finally
    (forever $! do
       bytes <- hGetBufSome s a len
       if bytes > 0
         then hPutBuf t a bytes
         else throwRecv0)
    (free a)

One would expect that hGetBufSome and hPutBuf here would not need to allocate memory, as they write into and read from a pre-allocated buffer. The docs seem to back this intuition up... But alas:

                                        individual     inherited
COST CENTRE                            %time %alloc   %time %alloc      bytes

 hSplice                                 0.5    0.0    38.1   61.1       3792
  hPutBuf                                0.4    1.0    19.8   29.9   12800000
   hPutBuf'                              0.4    0.4    19.4   28.9    4800000
    wantWritableHandle                   0.1    0.1    19.0   28.5    1600000
     wantWritableHandle'                 0.0    0.0    18.9   28.4          0
      withHandle_'                       0.0    0.1    18.9   28.4    1600000
       withHandle'                       1.0    3.8    18.8   28.3   48800000
        do_operation                     1.1    3.4    17.8   24.5   44000000
         withHandle_'.\                  0.3    1.1    16.7   21.0   14400000
          checkWritableHandle            0.1    0.2    16.4   19.9    3200000
           hPutBuf'.\                    1.1    3.3    16.3   19.7   42400000
            flushWriteBuffer             0.7    1.4    12.1    6.2   17600000
             flushByteWriteBuffer       11.3    4.8    11.3    4.8   61600000
            bufWrite                     1.7    6.9     3.0    9.9   88000000
             copyToRawBuffer             0.1    0.2     1.2    2.8    3200000
              withRawBuffer              0.3    0.8     1.2    2.6   10400000
               copyToRawBuffer.\         0.9    1.7     0.9    1.7   22400000
             debugIO                     0.1    0.2     0.1    0.2    3200000
            debugIO                      0.1    0.2     0.1    0.2    3200016
  hGetBufSome                            0.0    0.0    17.7   31.2         80
   wantReadableHandle_                   0.0    0.0    17.7   31.2         32
    wantReadableHandle'                  0.0    0.0    17.7   31.2          0
     withHandle_'                        0.0    0.0    17.7   31.2         32
      withHandle'                        1.6    2.4    17.7   31.2   30400976
       do_operation                      0.4    2.4    16.1   28.8   30400880
        withHandle_'.\                   0.5    1.1    15.8   26.4   14400288
         checkReadableHandle             0.1    0.4    15.3   25.3    4800096
          hGetBufSome.\                  8.7   14.8    15.2   24.9  190153648
           bufReadNBNonEmpty             2.6    4.4     6.1    8.0   56800000
            bufReadNBNonEmpty.buf'       0.0    0.4     0.0    0.4    5600000
            bufReadNBNonEmpty.so_far'    0.2    0.1     0.2    0.1    1600000
            bufReadNBNonEmpty.remaining  0.2    0.1     0.2    0.1    1600000
            copyFromRawBuffer            0.1    0.2     2.9    2.8    3200000
             withRawBuffer               1.0    0.8     2.8    2.6   10400000
              copyFromRawBuffer.\        1.8    1.7     1.8    1.7   22400000
            bufReadNBNonEmpty.avail      0.2    0.1     0.2    0.1    1600000
           flushCharReadBuffer           0.3    2.1     0.3    2.1   26400528

heap profile by moduleheap profile by type

I have to assume this is on purpose... but I have no idea what that purpose might be. Even worse: I'm just barely clever enough to get this profile, but not quite clever enough to figure out exactly what's being allocated.

Any help along those lines would be appreciated.


UPDATE: I've done some more profiling with two drastically simplified testcases. The first testcase directly uses the read/write ops from System.Posix.Internals:

echo :: Ptr Word8 -> IO ()
echo buf = forever $ do
  threadWaitRead $ Fd 0
  len <- c_read 0 buf 1
  c_write 1 buf (fromIntegral len)
  yield

As you'd hope, this allocates no memory on the heap each time through the loop. The second testcase uses the read/write ops from GHC.IO.FD:

echo :: Ptr Word8 -> IO ()
echo buf = forever $ do
  len <- readRawBufferPtr "read" stdin buf 0 1
  writeRawBufferPtr "write" stdout buf 0 (fromIntegral len)

UPDATE #2: I was advised to file this as a bug in GHC Trac... I'm still not sure it actually is a bug (as opposed to intentional behavior, a known limitation, or whatever) but here it is: https://ghc.haskell.org/trac/ghc/ticket/9696

like image 469
mergeconflict Avatar asked Oct 13 '14 06:10

mergeconflict


1 Answers

I'll try to guess based on the code

Runtime tries to optimize small reads and writes, so it maintains internal buffer. If your buffer is 1 byte long, it will be inefficient to use it dirrectly. So internal buffer is used to read bigger chunk of data. It is probably ~32Kb long. Plus something similar for writing. Plus your own buffer.

The code has an optimization -- if you provide buffer bigger then the internal one, and the later is empty, it will use your buffer dirrectly. But the internal buffer is already allocated, so it will not less memory usage. I don't know how to dissable internal buffer, but you can open feature request if it is important for you.

(I realize that my guess can be totally wrong.)

ADD:

This one does seem to allocate, but I still don't know why.

What is your concern, max memory usage or number of allocated bytes?

c_read is a C function, it doesn't allocate on haskell's heap (but may allocate on C heap.)

readRawBufferPtr is Haskell function, and it is usual for haskell functions to allocate a lot of memory, that quickly becomes a garbage. Simply because of immutability. It is common for haskell program to allocate e.g 100Gb while memory usage is under 1Mb.

like image 79
Yuras Avatar answered Oct 17 '22 06:10

Yuras