In the process of doing some simple benchmarking, I came across something that surprised me. Take this snippet from Network.Socket.Splice:
hSplice :: Int -> Handle -> Handle -> IO ()
hSplice len s t = do
a <- mallocBytes len :: IO (Ptr Word8)
finally
(forever $! do
bytes <- hGetBufSome s a len
if bytes > 0
then hPutBuf t a bytes
else throwRecv0)
(free a)
One would expect that hGetBufSome
and hPutBuf
here would not need to allocate memory, as they write into and read from a pre-allocated buffer. The docs seem to back this intuition up... But alas:
individual inherited
COST CENTRE %time %alloc %time %alloc bytes
hSplice 0.5 0.0 38.1 61.1 3792
hPutBuf 0.4 1.0 19.8 29.9 12800000
hPutBuf' 0.4 0.4 19.4 28.9 4800000
wantWritableHandle 0.1 0.1 19.0 28.5 1600000
wantWritableHandle' 0.0 0.0 18.9 28.4 0
withHandle_' 0.0 0.1 18.9 28.4 1600000
withHandle' 1.0 3.8 18.8 28.3 48800000
do_operation 1.1 3.4 17.8 24.5 44000000
withHandle_'.\ 0.3 1.1 16.7 21.0 14400000
checkWritableHandle 0.1 0.2 16.4 19.9 3200000
hPutBuf'.\ 1.1 3.3 16.3 19.7 42400000
flushWriteBuffer 0.7 1.4 12.1 6.2 17600000
flushByteWriteBuffer 11.3 4.8 11.3 4.8 61600000
bufWrite 1.7 6.9 3.0 9.9 88000000
copyToRawBuffer 0.1 0.2 1.2 2.8 3200000
withRawBuffer 0.3 0.8 1.2 2.6 10400000
copyToRawBuffer.\ 0.9 1.7 0.9 1.7 22400000
debugIO 0.1 0.2 0.1 0.2 3200000
debugIO 0.1 0.2 0.1 0.2 3200016
hGetBufSome 0.0 0.0 17.7 31.2 80
wantReadableHandle_ 0.0 0.0 17.7 31.2 32
wantReadableHandle' 0.0 0.0 17.7 31.2 0
withHandle_' 0.0 0.0 17.7 31.2 32
withHandle' 1.6 2.4 17.7 31.2 30400976
do_operation 0.4 2.4 16.1 28.8 30400880
withHandle_'.\ 0.5 1.1 15.8 26.4 14400288
checkReadableHandle 0.1 0.4 15.3 25.3 4800096
hGetBufSome.\ 8.7 14.8 15.2 24.9 190153648
bufReadNBNonEmpty 2.6 4.4 6.1 8.0 56800000
bufReadNBNonEmpty.buf' 0.0 0.4 0.0 0.4 5600000
bufReadNBNonEmpty.so_far' 0.2 0.1 0.2 0.1 1600000
bufReadNBNonEmpty.remaining 0.2 0.1 0.2 0.1 1600000
copyFromRawBuffer 0.1 0.2 2.9 2.8 3200000
withRawBuffer 1.0 0.8 2.8 2.6 10400000
copyFromRawBuffer.\ 1.8 1.7 1.8 1.7 22400000
bufReadNBNonEmpty.avail 0.2 0.1 0.2 0.1 1600000
flushCharReadBuffer 0.3 2.1 0.3 2.1 26400528
I have to assume this is on purpose... but I have no idea what that purpose might be. Even worse: I'm just barely clever enough to get this profile, but not quite clever enough to figure out exactly what's being allocated.
Any help along those lines would be appreciated.
UPDATE: I've done some more profiling with two drastically simplified testcases. The first testcase directly uses the read/write ops from System.Posix.Internals:
echo :: Ptr Word8 -> IO ()
echo buf = forever $ do
threadWaitRead $ Fd 0
len <- c_read 0 buf 1
c_write 1 buf (fromIntegral len)
yield
As you'd hope, this allocates no memory on the heap each time through the loop. The second testcase uses the read/write ops from GHC.IO.FD:
echo :: Ptr Word8 -> IO ()
echo buf = forever $ do
len <- readRawBufferPtr "read" stdin buf 0 1
writeRawBufferPtr "write" stdout buf 0 (fromIntegral len)
UPDATE #2: I was advised to file this as a bug in GHC Trac... I'm still not sure it actually is a bug (as opposed to intentional behavior, a known limitation, or whatever) but here it is: https://ghc.haskell.org/trac/ghc/ticket/9696
I'll try to guess based on the code
Runtime tries to optimize small reads and writes, so it maintains internal buffer. If your buffer is 1 byte long, it will be inefficient to use it dirrectly. So internal buffer is used to read bigger chunk of data. It is probably ~32Kb long. Plus something similar for writing. Plus your own buffer.
The code has an optimization -- if you provide buffer bigger then the internal one, and the later is empty, it will use your buffer dirrectly. But the internal buffer is already allocated, so it will not less memory usage. I don't know how to dissable internal buffer, but you can open feature request if it is important for you.
(I realize that my guess can be totally wrong.)
ADD:
This one does seem to allocate, but I still don't know why.
What is your concern, max memory usage or number of allocated bytes?
c_read
is a C function, it doesn't allocate on haskell's heap (but may allocate on C heap.)
readRawBufferPtr
is Haskell function, and it is usual for haskell functions to allocate a lot of memory, that quickly becomes a garbage. Simply because of immutability. It is common for haskell program to allocate e.g 100Gb while memory usage is under 1Mb.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With