Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Attoparsec Iteratee

I wanted, just to learn a bit about Iteratees, reimplement a simple parser I made, using Data.Iteratee and Data.Attoparsec.Iteratee. I'm pretty much stumped though. Below I have a simple example that is able to parse one line from a file. My parser reads one line at a time, so I need a way of feeding lines to the iteratee until it's done. I've read all I've found googling this, but a lot of the material on iteratee/enumerators is pretty advanced. This is the part of the code that matters:

-- There are more imports above.
import Data.Attoparsec.Iteratee
import Data.Iteratee (joinI, run)
import Data.Iteratee.IO (defaultBufSize, enumFile)

line :: Parser ByteString -- left the implementation out (it doesn't check for 
                             new line)

iter = parserToIteratee line

main = do
    p <- liftM head getArgs
    i <- enumFile defaultBufSize p $ iter
    i' <- run i
    print i'

This example will parse and print one line from a file with multiple lines. The original script mapped the parser over a list of ByteStrings. So I would like to do the same thing here. I found enumLinesin Iteratee, but I can't for the life of me figure out how to use it. Maybe I misunderstand its purpose?

like image 852
Johanna Larsson Avatar asked Jun 15 '11 16:06

Johanna Larsson


1 Answers

Since your parser works on a line at a time, you don't even need to use attoparsec-iteratee. I would write this as:

import Data.Iteratee as I
import Data.Iteratee.Char
import Data.Attoparsec as A

parser :: Parser ParseOutput
type POut = Either String ParseOutput

processLines :: Iteratee ByteString IO [POut]
processLines = joinI $ (enumLinesBS ><> I.mapStream (A.parseOnly parser)) stream2list

The key to understanding this is the "enumeratee", which is just the iteratee term for a stream converter. It takes a stream processor (iteratee) of one stream type and converts it to work with another stream. Both enumLinesBS and mapStream are enumeratees.

To map your parser over multiple lines, mapStream is sufficient:

i1 :: Iteratee [ByteString] IO (Iteratee [POut] IO [POut]
i1 = mapStream (A.parseOnly parser) stream2list

The nested iteratees just mean that this converts a stream of [ByteString] to a stream of [POut], and when the final iteratee (stream2list) is run it returns that stream as [POut]. So now you just need the iteratee equivalent of lines to create that stream of [ByteString], which is what enumLinesBS does:

i2 :: Iteratee ByteString IO (Iteratee [ByteString] IO (Iteratee [POut] m [POut])))
i2 = enumLinesBS $ mapStream f stream2list

But this function is pretty unwieldy to use because of all the nesting. What we really want is a way to pipe output directly between stream converters, and at the end simplify everything to a single iteratee. To do this we use joinI, (><>), and (><>):

e1 :: Iteratee [POut] IO a -> Iteratee ByteString IO (Iteratee [POut] IO a)
e1 = enumLinesBS ><> mapStream (A.parseOnly parser)

i' :: Iteratee ByteString IO [POut]
i' = joinI $ e1 stream2list

which is equivalent to how I wrote it above, with e1 inlined.

There's still important element remaining though. This function simply returns the parse results in a list. Typically you would want to do something else, such as combine the results with a fold.

edit: Data.Iteratee.ListLike.mapM_ is often useful to create consumers. At that point each element of the stream is a parse result, so if you want to print them you can use

consumeParse :: Iteratee [POut] IO ()
consumeParse = I.mapM_ (either (\e -> return ()) print)

processLines2 :: Iteratee ByteString IO ()
processLines2 = joinI $ (enumLinesBS ><> I.mapStream (A.parseOnly parser)) consumeParse

This will print just the successful parses. You could easily report errors to STDERR, or handle them in other ways, as well.

like image 183
John L Avatar answered Sep 20 '22 15:09

John L