Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I reduce garbage collection when parsing a binary file in Haskell?

I am writing a program to parse .TGA files using Haskell - however, the performance is absolutely dreadful. A 2048 x 2048 pixel image takes a few seconds to parse.

I ran my code using +RTS -p -RTS and received the following interesting pieces from the report:

total time = 1.08 secs
total alloc = 3,037,568,120 bytes

COST CENTRE           MODULE    %time    %alloc
readPixelMap            Main     33.0      11.0
readPixelMap.getByte    Main     32.7      75.1
readPixelMap.getColor   Main     27.0      13.3

It appears that my program is allocating a huge amount of memory in the readPixelMap function. That function looks like this:

readPixelMap width height = do
    pixels <- replicateM height (replicateM width getColor)
    return $ PixMap pixels
    where getByte = liftM toInteger getWord8
          getColor = do (red:green:blue:alpha:xs) <- replicateM 4 getByte
                        return (red, green, blue, alpha)

and a PixMap is defined as

data PixMap = PixMap [[RGBAColor]] deriving (Show)
type RGBAColor = (Integer, Integer, Integer, Integer)

readPixelMap is called using the Get monad from Data.Binary:

main = do
    binary <- BS.readFile "test.tga"
    let (pixelMap, binary', nil) = runGetState (readPixelMap 2048 2048) binary 0

I've been working under the impression that a 2048 x 2048 .TGA image should load into a few hundred megabytes (at most) of memory. Is there an obvious way to improve my garbage collection times / allocation amount?

like image 609
sdasdadas Avatar asked Dec 25 '13 02:12

sdasdadas


1 Answers

Your RGBColor type uses 5 machine words, 4 of which are pointers to each Integer. Each Integer consists of one machine word for GC/tagging, and either one additional word for the small representation, or the large representation consisting of a pointer to a byte array for the GMP data and a limb count.

Additionally, you're using a list of lists to store the values. Each non-empty list node is a word for GC/tagging, a word pointer to the value, and a word pointer to the tail.

To use less memory, use appropriate data types. Use unpacked values where possible. Use Word8 instead of Integer for 8-bit values. Use arrays, instead of lists. You know, the basics of acting like you care about memory use.

Take a look at http://johantibell.com/files/slides.pdf for some of the basics.

like image 93
Carl Avatar answered Nov 14 '22 22:11

Carl