I wanted, just to learn a bit about Iteratees, reimplement a simple parser I made, using Data.Iteratee and Data.Attoparsec.Iteratee. I'm pretty much stumped though. Below I have a simple example that is able to parse one line from a file. My parser reads one line at a time, so I need a way of feeding lines to the iteratee until it's done. I've read all I've found googling this, but a lot of the material on iteratee/enumerators is pretty advanced. This is the part of the code that matters:
-- There are more imports above.
import Data.Attoparsec.Iteratee
import Data.Iteratee (joinI, run)
import Data.Iteratee.IO (defaultBufSize, enumFile)
line :: Parser ByteString -- left the implementation out (it doesn't check for
new line)
iter = parserToIteratee line
main = do
p <- liftM head getArgs
i <- enumFile defaultBufSize p $ iter
i' <- run i
print i'
This example will parse and print one line from a file with multiple lines. The original script mapped the parser over a list of ByteStrings. So I would like to do the same thing here. I found enumLines
in Iteratee, but I can't for the life of me figure out how to use it. Maybe I misunderstand its purpose?
Since your parser works on a line at a time, you don't even need to use attoparsec-iteratee. I would write this as:
import Data.Iteratee as I
import Data.Iteratee.Char
import Data.Attoparsec as A
parser :: Parser ParseOutput
type POut = Either String ParseOutput
processLines :: Iteratee ByteString IO [POut]
processLines = joinI $ (enumLinesBS ><> I.mapStream (A.parseOnly parser)) stream2list
The key to understanding this is the "enumeratee", which is just the iteratee term for a stream converter. It takes a stream processor (iteratee) of one stream type and converts it to work with another stream. Both enumLinesBS
and mapStream
are enumeratees.
To map your parser over multiple lines, mapStream
is sufficient:
i1 :: Iteratee [ByteString] IO (Iteratee [POut] IO [POut]
i1 = mapStream (A.parseOnly parser) stream2list
The nested iteratees just mean that this converts a stream of [ByteString]
to a stream of [POut]
, and when the final iteratee (stream2list) is run it returns that stream as [POut]
. So now you just need the iteratee equivalent of lines
to create that stream of [ByteString]
, which is what enumLinesBS
does:
i2 :: Iteratee ByteString IO (Iteratee [ByteString] IO (Iteratee [POut] m [POut])))
i2 = enumLinesBS $ mapStream f stream2list
But this function is pretty unwieldy to use because of all the nesting. What we really want is a way to pipe output directly between stream converters, and at the end simplify everything to a single iteratee. To do this we use joinI
, (><>)
, and (><>)
:
e1 :: Iteratee [POut] IO a -> Iteratee ByteString IO (Iteratee [POut] IO a)
e1 = enumLinesBS ><> mapStream (A.parseOnly parser)
i' :: Iteratee ByteString IO [POut]
i' = joinI $ e1 stream2list
which is equivalent to how I wrote it above, with e1
inlined.
There's still important element remaining though. This function simply returns the parse results in a list. Typically you would want to do something else, such as combine the results with a fold.
edit: Data.Iteratee.ListLike.mapM_
is often useful to create consumers. At that point each element of the stream is a parse result, so if you want to print them you can use
consumeParse :: Iteratee [POut] IO ()
consumeParse = I.mapM_ (either (\e -> return ()) print)
processLines2 :: Iteratee ByteString IO ()
processLines2 = joinI $ (enumLinesBS ><> I.mapStream (A.parseOnly parser)) consumeParse
This will print just the successful parses. You could easily report errors to STDERR, or handle them in other ways, as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With