I have a 37MB
bin file I am trying to convert to a ppm sequence. It works fine, and I'm trying to use this as an exercise to learn some profiling and more about lazy bytestrings in Haskell. My program seems to bomb at the concatMap
, which is used to replicate each byte three times so I have R, G, and B. The code is fairly straight forward - every 2048 bytes I write a new header:
{-# LANGUAGE OverloadedStrings #-}
import System.IO
import System.Environment
import Control.Monad
import qualified Data.ByteString.Lazy.Char8 as B
main :: IO ()
main = do [from, to] <- getArgs
withFile from ReadMode $ \inH ->
withFile to WriteMode $ \outH ->
loop (B.hGet inH 2048) (process outH) B.null
loop :: (Monad m) => m a -> (a -> m ()) -> (a -> Bool) -> m ()
loop inp outp done = inp >>= \x -> unless (done x) (outp x >> loop inp outp done)
process :: Handle -> B.ByteString -> IO ()
process h bs | B.null bs = return ()
| otherwise = B.hPut h header >> B.hPut h bs'
where header = "P6\n32 64\n255\n" :: B.ByteString
bs' = B.concatMap (B.replicate 3) bs
This pulls it off in a little over 5s
. It's not terrible, and my only comparison is my very naive C implementation that does it a little under 4s
- so that or ideally under has been my goal.
Here is the RTS from the above code:
33,435,345,688 bytes allocated in the heap
14,963,640 bytes copied during GC
54,640 bytes maximum residency (77 sample(s))
21,136 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 64604 colls, 0 par 0.20s 0.25s 0.0000s 0.0001s
Gen 1 77 colls, 0 par 0.00s 0.01s 0.0001s 0.0006s
INIT time 0.00s ( 0.00s elapsed)
MUT time 5.09s ( 5.27s elapsed)
GC time 0.21s ( 0.26s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 5.29s ( 5.52s elapsed)
%GC time 3.9% (4.6% elapsed)
Alloc rate 6,574,783,667 bytes per MUT second
Productivity 96.1% of total user, 92.1% of total elapsed
Pretty gnarly results. When I remove the concatMap and just copy everything over with the headers every 2048 bytes, it's practically instant:
70,983,992 bytes allocated in the heap
48,912 bytes copied during GC
54,640 bytes maximum residency (2 sample(s))
19,744 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 204 colls, 0 par 0.00s 0.00s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.00s 0.00s 0.0001s 0.0001s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.01s ( 0.07s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.02s ( 0.07s elapsed)
%GC time 9.6% (2.9% elapsed)
Alloc rate 5,026,838,892 bytes per MUT second
Productivity 89.8% of total user, 22.3% of total elapsed
So I guess my question is two fold:
Thank you.
Edit
Here's the final code and RTS if anyone is interested! I was also able to find additional bottlenecks by making use of ghc's profiler with -prof -auto-all -caf-all
after reading up on the Profiling and optimization chapter of Real World Haskell.
{-# LANGUAGE OverloadedStrings #-}
import System.IO
import System.Environment
import Control.Monad
import Data.Monoid
import qualified Data.ByteString.Builder as BU
import qualified Data.ByteString.Lazy.Char8 as BL
main :: IO ()
main = do [from, to] <- getArgs
withFile from ReadMode $ \inH ->
withFile to WriteMode $ \outH ->
loop (BL.hGet inH 2048) (process outH) BL.null
loop :: (Monad m) => m a -> (a -> m ()) -> (a -> Bool) -> m ()
loop inp outp done = inp >>= \x -> unless (done x) (outp x >> loop inp outp done)
upConcatMap :: Monoid c => (Char -> c) -> BL.ByteString -> c
upConcatMap f bs = mconcat . map f $ BL.unpack bs
process :: Handle -> BL.ByteString -> IO ()
process h bs | BL.null bs = return ()
| otherwise = BU.hPutBuilder h frame
where header = "P6\n32 64\n255\n"
bs' = BU.toLazyByteString $ upConcatMap trip bs
frame = BU.lazyByteString $ mappend header bs'
trip c = let b = BU.char8 c in mconcat [b, b, b]
6,383,263,640 bytes allocated in the heap
18,596,984 bytes copied during GC
54,640 bytes maximum residency (2 sample(s))
31,056 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 11165 colls, 0 par 0.06s 0.06s 0.0000s 0.0001s
Gen 1 2 colls, 0 par 0.00s 0.00s 0.0001s 0.0002s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.69s ( 0.83s elapsed)
GC time 0.06s ( 0.06s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.75s ( 0.89s elapsed)
%GC time 7.4% (7.2% elapsed)
Alloc rate 9,194,103,284 bytes per MUT second
Productivity 92.6% of total user, 78.0% of total elapsed
What about Builder?
This version is ~5x faster for me:
process :: Handle -> B.ByteString -> IO ()
process h bs
| B.null bs = return ()
| otherwise = B.hPut h header >> B.hPutBuilder h bs'
where header = "P6\n32 64\n255\n" :: B.ByteString
bs' = mconcat $ map triple $ B.unpack bs
triple c = let b = B.char8 c in mconcat [b, b, b]
It allocates much less garbage.
ADD: for the reference, runtime statistics:
4,642,746,104 bytes allocated in the heap
390,110,640 bytes copied during GC
63,592 bytes maximum residency (2 sample(s))
21,648 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 8992 colls, 0 par 0.54s 0.63s 0.0001s 0.0017s
Gen 1 2 colls, 0 par 0.00s 0.00s 0.0002s 0.0002s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.98s ( 1.13s elapsed)
GC time 0.54s ( 0.63s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 1.52s ( 1.76s elapsed)
%GC time 35.4% (36.0% elapsed)
Alloc rate 4,718,237,910 bytes per MUT second
Productivity 64.6% of total user, 55.9% of total elapsed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With