Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Streaming parsing of JSON in Haskell with Pipes.Aeson

The Pipes.Aeson library exposes the following function:

decode :: (Monad m, ToJSON a) => Parser ByteString m (Either DecodingError a)

If I use evalStateT with this parser and a file handle as an argument, a single JSON object is read from the file and parsed.

The problem is that the file contains several objects (all of the same type) and I'd like to fold or reduce them as they are read.

Pipes.Parse provides:

foldAll :: Monad m => (x -> a -> x) -> x -> (x -> b) -> Parser a m b

but as you can see this returns a new parser - I can't think of a way of supplying the first parser as an argument.

It looks like a Parser is actually a Producer in a StateT monad transformer. I wondered whether there's a way of extracting the Producer from the StateT so that evalStateT can be applied to the foldAll Parser, and the Producer from the decode Parser.

This is probably completely the wrong approach though.

My question, in short:
When parsing a file using Pipes.Aeson, what's the best way to fold all the objects in the file?

like image 629
immutablestate Avatar asked May 17 '14 18:05

immutablestate


1 Answers

Instead of using decode, you can use the decoded parsing lens from Pipes.Aeson.Unchecked. It turns a producer of ByteString into a producer of parsed JSON values.

{-# LANGUAGE OverloadedStrings #-}

module Main where

import Pipes
import qualified Pipes.Prelude as P
import qualified Pipes.Aeson as A
import qualified Pipes.Aeson.Unchecked as AU
import qualified Data.ByteString as B

import Control.Lens (view)

byteProducer :: Monad m => Producer B.ByteString m ()
byteProducer = yield "1 2 3 4"

intProducer :: Monad m => Producer Int m (Either (A.DecodingError, Producer B.ByteString m ()) ())
intProducer = view AU.decoded byteProducer

The return value of intProducer is a bit scary, but it only means that intProducer finishes either with a parsing error and the unparsed bytes after the error, or with the return value of the original producer (which is () in our case).

We can ignore the return value:

intProducer' :: Monad m => Producer Int m ()
intProducer' = intProducer >> return ()

And plug the producer into a fold from Pipes.Prelude, like sum:

main :: IO ()
main = do
    total <- P.sum intProducer'
    putStrLn $ show total

In ghci:

λ :main
10

Note also that the functions purely and impurely let you apply to producers folds defined in the foldl package.

like image 166
danidiaz Avatar answered Oct 01 '22 17:10

danidiaz