I wrote a parser for a custom file using attoparsec
.
The profiling report indicated that around 67% of the memory allocation is done in a function named tab
, which also consumes the most time.
The tab
function is pretty simple:
tab :: Parser Char
tab = char '\t'
The entire profiling report is as follows:
ASnapshotParser +RTS -p -h -RTS
total time = 37.88 secs (37882 ticks @ 1000 us, 1 processor)
total alloc = 54,255,105,384 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
tab Main 83.1 67.7
main Main 6.4 4.2
readTextDevice Data.Text.IO.Internal 5.5 24.0
snapshotParser Main 4.7 4.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 75 0 0.0 0.0 100.0 100.0
CAF Main 149 0 0.0 0.0 100.0 100.0
tab Main 156 1 0.0 0.0 0.0 0.0
snapshotParser Main 153 1 0.0 0.0 0.0 0.0
main Main 150 1 6.4 4.2 100.0 100.0
doStuff Main 152 1000398 0.3 0.0 88.1 71.8
snapshotParser Main 154 0 4.7 4.0 87.7 71.7
tab Main 157 0 83.1 67.7 83.1 67.7
readTextDevice Data.Text.IO.Internal 151 40145 5.5 24.0 5.5 24.0
CAF Data.Text.Array 142 0 0.0 0.0 0.0 0.0
CAF Data.Text.Internal 140 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 122 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 103 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 101 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 100 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 89 0 0.0 0.0 0.0 0.0
main Main 155 0 0.0 0.0 0.0 0.0
How do I optimize this?
The entire code for the parser is here. The file which I'm parsing is around 77MB.
tab
is a scapegoat. If you define boo :: Parser (); boo = return ()
and insert a boo
before every bind in the snapshotParser
definition, the cost allocations will become something like:
main Main 255 0 11.8 13.8 100.0 100.0
doStuff Main 258 2097153 1.1 0.5 86.2 86.2
snapshotParser Main 260 0 0.4 0.1 85.1 85.7
boo Main 262 0 71.0 73.2 84.8 85.5
tab Main 265 0 13.8 12.3 13.8 12.3
So it seems the profiler is shifting the blame for the allocations of the parse results, likely due to extensive inlining of attoparsec
code, as John L suggested in the comments.
As for the performance issues, the key point is that, as you are parsing a 77MB text file to build a list with a million elements, you want the file processing to be lazy, and not strict. Once that is sorted out, decoupling I/O and parsing in doStuff
and building the list of snapshots without an accumulator are helpful as well. Here is a modified version of your program taking that into account.
{-# LANGUAGE BangPatterns #-}
module Main where
import Data.Maybe
import Data.Attoparsec.Text.Lazy
import Control.Applicative
import qualified Data.Text.Lazy.IO as TL
import Data.Text (Text)
import qualified Data.Text.Lazy as TL
buildStuff :: TL.Text -> [Snapshot]
buildStuff text = case maybeResult (parse endOfInput text) of
Just _ -> []
Nothing -> case parse snapshotParser text of
Done !i !r -> r : buildStuff i
Fail _ _ _ -> []
main :: IO ()
main = do
text <- TL.readFile "./snap.dat"
let ss = buildStuff text
print $ listToMaybe ss
>> Just (fromIntegral (length $ show ss) / fromIntegral (length ss))
newtype VehicleId = VehicleId Int deriving Show
newtype Time = Time Int deriving Show
newtype LinkID = LinkID Int deriving Show
newtype NodeID = NodeID Int deriving Show
newtype LaneID = LaneID Int deriving Show
tab :: Parser Char
tab = char '\t'
-- UNPACK pragmas. GHC 7.8 unboxes small strict fields automatically;
-- however, it seems we still need the pragmas while profiling.
data Snapshot = Snapshot {
vehicle :: {-# UNPACK #-} !VehicleId,
time :: {-# UNPACK #-} !Time,
link :: {-# UNPACK #-} !LinkID,
node :: {-# UNPACK #-} !NodeID,
lane :: {-# UNPACK #-} !LaneID,
distance :: {-# UNPACK #-} !Double,
velocity :: {-# UNPACK #-} !Double,
vehtype :: {-# UNPACK #-} !Int,
acceler :: {-# UNPACK #-} !Double,
driver :: {-# UNPACK #-} !Int,
passengers :: {-# UNPACK #-} !Int,
easting :: {-# UNPACK #-} !Double,
northing :: {-# UNPACK #-} !Double,
elevation :: {-# UNPACK #-} !Double,
azimuth :: {-# UNPACK #-} !Double,
user :: {-# UNPACK #-} !Int
} deriving (Show)
-- No need for bang patterns here.
snapshotParser :: Parser Snapshot
snapshotParser = do
sveh <- decimal
tab
stime <- decimal
tab
slink <- decimal
tab
snode <- decimal
tab
slane <- decimal
tab
sdistance <- double
tab
svelocity <- double
tab
svehtype <- decimal
tab
sacceler <- double
tab
sdriver <- decimal
tab
spassengers <- decimal
tab
seasting <- double
tab
snorthing <- double
tab
selevation <- double
tab
sazimuth <- double
tab
suser <- decimal
endOfLine <|> endOfInput
return $ Snapshot
(VehicleId sveh) (Time stime) (LinkID slink) (NodeID snode)
(LaneID slane) sdistance svelocity svehtype sacceler sdriver
spassengers seasting snorthing selevation sazimuth suser
This version should have acceptable performance even if you force the whole list of snapshots into memory, as I did in main
here. To gauge what is "acceptable", keep in mind that, given the sixteen (small, unboxed) fields in each Snapshot
plus the overhead of the Snapshot
and list constructors, we are talking about 152 bytes per list cell, which boils down to ~152MB for your test data. In any case, this version is about as lazy as possible, as you can see by removing the division in main
, or replacing it by last ss
.
N.B.: My tests were done with attoparsec-0.12.
After updating the attoparsec to the latest version (0.12.0.0), the time taken to execute reduce from 38 seconds to 16 seconds. That's more than 50% speedup. Also the memory consumed by it reduced drastically. As @JohnL noted, with profiling enabled, the results are varying wildly. When I tried to profile it with the latest version of the attoparsec library, it took around 64 seconds to execute the whole program.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With