Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using haskell pipes-bytestring to iterate a file by line

I am using the pipes library and need to convert a ByteString stream to a stream of lines (i.e. String), using ASCII encoding. I am aware that there are other libraries (Pipes.Text and Pipes.Prelude) that perhaps let me yield lines from a text file more easily, but because of some other code I need to be able to get lines as String from a Producer of ByteString.

More formally, I need to convert a Producer ByteString IO () to a Producer String IO (), which yields lines.

I am sure this must be a one-liner for an experienced Pipes-Programmer, but I so far did not manage to successfully hack through all the FreeT and Lens-trickery in Pipes-ByteString.

Any help is much appreciated!

Stephan

like image 663
Stephan Avatar asked Sep 22 '14 19:09

Stephan


2 Answers

If you need that type signature, then I would suggest this:

import Control.Foldl (mconcat, purely)
import Data.ByteString (ByteString)
import Data.Text (unpack)
import Lens.Family (view)
import Pipes (Producer, (>->))
import Pipes.Group (folds)
import qualified Pipes.Prelude as Pipes
import Pipes.Text (lines)
import Pipes.Text.Encoding (utf8)
import Prelude hiding (lines)

getLines
    :: Producer ByteString IO r -> Producer String IO (Producer ByteString IO r)
getLines p = purely folds mconcat (view (utf8 . lines) p) >-> Pipes.map unpack

This works because the type of purely folds mconcat is:

purely folds mconcat
    :: (Monad m, Monoid t) => FreeT (Producer t m) r -> Producer t m r

... where t in this case would be Text:

purely folds mconcat
    :: Monad m => FreeT (Producer Text m) r -> Producer Text m r

Any time you want to reduce each Producer sub-group of a FreeT-delimited stream you probably want to use purely folds. Then it's just a matter of picking the right Fold to reduce the sub-group with. In this case, you just want to concatenate all the Text chunks within a group, so you pass in mconcat. I generally don't recommend doing this since it will break on extremely long lines, but you specified that you needed this behavior.

The reason this is verbose is because the pipes ecosystem promotes Text over String and also tries to encourage handling arbitrarily long lines. If you were not constrained by your other code then the more idiomatic approach would just be:

view (utf8 . lines)
like image 190
Gabriella Gonzalez Avatar answered Oct 11 '22 09:10

Gabriella Gonzalez


After a little bit of hacking and some hints from this blog, I came up with a solution, but it is surprisingly clumsy, and I fear a bit inefficient as well, as it uses ByteString.append:

import Pipes
import qualified Pipes.ByteString as PB
import qualified Pipes.Prelude as PP
import qualified Pipes.Group as PG
import qualified Data.ByteString.Char8 as B
import Lens.Family (view )
import Control.Monad (liftM)

getLines :: Producer PB.ByteString IO r -> Producer String IO r
getLines = PG.concats . PG.maps toStringProducer . view PB.lines

toStringProducer :: Producer PB.ByteString IO r -> Producer String IO r
toStringProducer producer = go producer B.empty
  where
    go producer bs = do
        x <- lift $ next producer
        case x of
            Left r -> do
                yield $ B.unpack bs
                return r
            Right (bs', producer') -> go producer' (B.append bs' bs)
like image 29
Stephan Avatar answered Oct 11 '22 10:10

Stephan