Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mimicking Python generator, with IO, in Haskell

Tags:

haskell

This is a question about how to do something in Haskell, that is easy to do in Python3.

I have a Python3 program that uses generators, as follows:

def gen(filename):
    for line in open(filename):
        line = line.rstrip()
        print(f"From {filename} about to yield the line {line}")
        yield line
        print(f"From {filename} recently yielded the line {line}")
        if "." in line:
            yield from gen(line)

for line in gen("a.txt"):
    print(f"Main program has {line}")

If I give it an input file a.txt containing the following:

First line of a
b.txt
Third line of a

and another input file b.txt containing the following:

First line of b
Second line of b

then the output, as I would expect, is:

From a.txt about to yield the line First line of a
Main program has First line of a
From a.txt recently yielded the line First line of a
From a.txt about to yield the line b.txt
Main program has b.txt
From a.txt recently yielded the line b.txt
From b.txt about to yield the line First line of b
Main program has First line of b
From b.txt recently yielded the line First line of b
From b.txt about to yield the line Second line of b
Main program has Second line of b
From b.txt recently yielded the line Second line of b
From a.txt about to yield the line Third line of a
Main program has Third line of a
From a.txt recently yielded the line Third line of a

I would very much like to be able to do the same thing in Haskell. I need to preserve the basic structure of the Python version, keeping gen() as a function (or semi-coroutine, if you want to call it that) that I can call from various different places.

In my attempts to write it in Haskell, I am getting in a dreadful mess with types like:

IO [IO String]

and I suspect that I'm going about it the wrong way.

Any suggestions?

like image 909
Retired Writing Code for Fun Avatar asked Jan 06 '19 21:01

Retired Writing Code for Fun


1 Answers

The data type you're looking for is FreeT:

data FreeF f a b = Pure a | Free (f b)
newtype FreeT f m a = FreeT { runFreeT :: m (FreeF f a (FreeT f m a)) }

FreeT f m a represents "alternating layers of m followed by f, except at any point, instead of a f-layer, there can be a terminating a-value". The specific form of this type that equates to your generators is

type Generator a = FreeT ((,) a) IO ()

A Generator a is alternating layers of IO-computation and a-production, except one of the IO-computations can terminate generation by producing a () instead.

instance (Functor f, Monad m) => Monad (FreeT f m)
instance MonadTrans (FreeT f)
lift :: (MonadTrans t, Monad m) => m a -> t m a
lift :: Monad m => m a -> FreeT f m a
liftF :: (Functor f, MonadFree f m) => f a -> m a
liftF :: (Functor f, Monad m) => f a -> FreeT f m a

So you can write gen in this manner:

-- singleton generator
yield :: a -> Generator a
yield x = liftF (x, ())
-- empty generator (unused/part of when)
end :: Generator a
end = return ()
gen :: FilePath -> Generator String
gen path = do handle <- lift $ openFile path
              fileLines <- lift $ lines <$> hGetContents handle
              forM_ fileLines $ \line -> do
                  lift $ putStrLn $ "From " ++ path ++ " about to yield the line " ++ line
                  yield line
                  lift $ putStrLn $ "From " ++ path ++ " recently yielded the line " ++ line
                  when ('.' `elem` line) $ gen line
              lift $ hClose handle

You can then tear this structure back down:

main = iterT (\(line, continue) -> putStrLn ("Main program has " ++ line) >> continue)
             (gen "a.txt")

FreeT has a cousin, named Stream. This is pretty much the same thing, except it uses some tricks to squeeze some more performance out. (The cost is that equality is now fuzzier than before, but we don't usually care about equality of control structures anyway.) Specifically, instead of alternating m and f layers, Stream is just a sequence of m and f layers in whatever order they're created in. As long as Monad m, this is fine, as adjacent m layers can always be joined together and new pure ones formed between adjacent f-layers, until the alternating structure of FreeT is recovered.

If you use Stream (which you probably should), then you may want to replace (,) with Of, from the same package:

data Of a b = !a :> b
type Generator a = Stream (Of a) IO ()

The strictness isn't particularly useful with String, but it does read better. It's more useful for other types, like Int.

like image 108
HTNW Avatar answered Nov 03 '22 04:11

HTNW