Suppose I need to parse a binary file, which starts with three 4-byte magic numbers. Two of them are fixed strings. The other, however, is the length of the file.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Attoparsec
import Data.Attoparsec.Enumerator
import Data.Enumerator hiding (foldl, foldl', map, head)
import Data.Enumerator.Binary hiding (map)
import qualified Data.ByteString as S
import System
main = do
f:_ <- getArgs
eitherStat <- run (enumFile f $$ iterMagics)
case eitherStat of
Left _err -> putStrLn $ "Not a beam file: " ++ f
Right _ -> return ()
iterMagics :: Monad m => Iteratee S.ByteString m ()
iterMagics = iterParser parseMagics
parseMagics :: Parser ()
parseMagics = do
_ <- string "FOR1"
len <- big_endians 4 -- need to compare with actual file length
_ <- string "BEAM"
return ()
big_endians :: Int -> Parser Int
big_endians n = do
ws <- count n anyWord8
return $ foldl1 (\a b -> a * 256 + b) $ map fromIntegral ws
If the stated length doesn't match the actual length, ideally iterMagics
should return an error. But how? Is the only way to pass the actual length in as an argument? Is this the iteratee-ish way to do so? Not very incremental for me :)
This can easily be done with enumeratees. First you read the three 4-byte magic numbers, then run an inner iteratee over the remainder. If you're using iteratee, it would look like more-or-less like this:
parseMagics :: Parser ()
parseMagics = do
_ <- string "FOR1"
len <- big_endians 4 -- need to compare with actual file length
_ <- string "BEAM"
return len
iterMagics :: Monad m => Iteratee S.ByteString m (Either String SomeResult)
iterMagics = do
len <- iterParser parseMagics
(result, bytesConsumed) <- joinI $ takeUpTo len (enumWith iterData I.length)
if len == bytesConsumed
then return $ Right result
else return $ Left "Data too short"
In this case it won't throw an error if the file is too long, but it will stop reading. You can modify it to check for that condition fairly easily. I don't think Enumerator has an analog function to enumWith
, so you'd probably need to count the bytes manually, but the same principle would apply.
Possibly a more pragmatic approach is to check the filesize before running the enumerator, and then just compare that to the value in the header. You'll need to either pass the filesize, or the filepath, as an argument to the iteratee (but not the parser).
import System.Posix
iterMagics2 filepath = do
fsize <- liftIO . liftM fileSize $ getFileStatus filepath
len <- iterParser parseMagics
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With