Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iteratee I/O: need to know file size beforehand

Tags:

haskell

Suppose I need to parse a binary file, which starts with three 4-byte magic numbers. Two of them are fixed strings. The other, however, is the length of the file.

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Data.Attoparsec
import Data.Attoparsec.Enumerator
import Data.Enumerator hiding (foldl, foldl', map, head)
import Data.Enumerator.Binary hiding (map)
import qualified Data.ByteString as S
import System

main = do
    f:_ <- getArgs
    eitherStat <- run (enumFile f $$ iterMagics)
    case eitherStat of
        Left _err -> putStrLn $ "Not a beam file: " ++ f
        Right _ -> return ()

iterMagics :: Monad m => Iteratee S.ByteString m ()
iterMagics = iterParser parseMagics

parseMagics :: Parser ()
parseMagics = do
    _ <- string "FOR1"
    len <- big_endians 4 -- need to compare with actual file length
    _ <- string "BEAM"
    return ()

big_endians :: Int -> Parser Int
big_endians n = do
    ws <- count n anyWord8
    return $ foldl1 (\a b -> a * 256 + b) $ map fromIntegral ws

If the stated length doesn't match the actual length, ideally iterMagics should return an error. But how? Is the only way to pass the actual length in as an argument? Is this the iteratee-ish way to do so? Not very incremental for me :)

like image 756
edwardw Avatar asked Jun 29 '11 14:06

edwardw


1 Answers

This can easily be done with enumeratees. First you read the three 4-byte magic numbers, then run an inner iteratee over the remainder. If you're using iteratee, it would look like more-or-less like this:

parseMagics :: Parser ()
parseMagics = do
    _ <- string "FOR1"
    len <- big_endians 4 -- need to compare with actual file length
    _ <- string "BEAM"
    return len

iterMagics :: Monad m => Iteratee S.ByteString m (Either String SomeResult)
iterMagics = do
  len <- iterParser parseMagics
  (result, bytesConsumed) <- joinI $ takeUpTo len (enumWith iterData I.length)
  if len == bytesConsumed
    then return $ Right result
    else return $ Left "Data too short"

In this case it won't throw an error if the file is too long, but it will stop reading. You can modify it to check for that condition fairly easily. I don't think Enumerator has an analog function to enumWith, so you'd probably need to count the bytes manually, but the same principle would apply.

Possibly a more pragmatic approach is to check the filesize before running the enumerator, and then just compare that to the value in the header. You'll need to either pass the filesize, or the filepath, as an argument to the iteratee (but not the parser).

import System.Posix

iterMagics2 filepath = do
  fsize <- liftIO . liftM fileSize $ getFileStatus filepath
  len <- iterParser parseMagics
like image 103
John L Avatar answered Nov 06 '22 07:11

John L