Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to enumerate a recursive datatype in Haskell?

This blog post has an interesting explanation of how to use the Omega monad to enumerate an arbitrary grammar diagonally. He offers an example of how to do so, resulting in an infinite sequence of strings. I'd like to do the same, except that, instead of generating a list of strings, it generates a list of an actual datatype. For example,

 data T = A | B T | C T T

Would generate

A, B A, C A A, C (B A) A... 

Or something similar. Unfortunately my Haskell skills are still maturing and after some hours playing it I couldn't manage to do what I want. How can that be done?

As requested, one of my attempts (I have tried too many things...):

import Control.Monad.Omega

data T = A | B T | C T T deriving (Show)

a = [A] 
        ++ (do { x <- each a; return (B x) })
        ++ (do { x <- each a; y <- each a; return (C x y) })

main = print $ take 10 $ a
like image 843
MaiaVictor Avatar asked May 07 '14 10:05

MaiaVictor


3 Answers

My first ugly approach was:

allTerms :: Omega T
allTerms = do
  which <- each [ 1,2,3 ]
  if which == 1 then
    return A
  else if which == 2 then do
    x <- allTerms
    return $ B x
  else do
    x <- allTerms
    y <- allTerms
    return $ C x y

But then, after some cleaning up I reached this one liner

import Control.Applicative
import Control.Monad.Omega
import Control.Monad

allTerms :: Omega T
allTerms = join $ each [return A, B <$> allTerms, C <$> allTerms <*> allTerms]

Note that order matters: return A has to be the first choice in the list above, or allTerms will not terminate. Basically, the Omega monad ensures a "fair scheduling" among choices, saving you from e.g. infiniteList ++ something, but does not prevent infinite recursion.


An even more elegant solution was suggested by Crazy FIZRUK, exploiting the Alternative instance of Omega.

import Control.Applicative
import Data.Foldable (asum)
import Control.Monad.Omega

allTerms :: Omega T
allTerms = asum [ pure A
                , B <$> allTerms
                , C <$> allTerms <*> allTerms
                ]
like image 74
chi Avatar answered Nov 12 '22 15:11

chi


I finally found the time to write a generic version. It uses the Universe typeclass, which represents recursively enumerabley types. Here it is:

{-# LANGUAGE DeriveGeneric, TypeOperators, ScopedTypeVariables #-}
{-# LANGUAGE FlexibleInstances, FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances, OverlappingInstances #-}

import Data.Universe
import Control.Monad.Omega
import GHC.Generics
import Control.Monad (mplus, liftM2)

class GUniverse f where
    guniverse :: [f a]

instance GUniverse U1 where
    guniverse = [U1]

instance (Universe c) => GUniverse (K1 i c) where
    guniverse = fmap K1 (universe :: [c])

instance (GUniverse f) => GUniverse (M1 i c f) where
    guniverse = fmap M1 (guniverse :: [f p])

instance (GUniverse f, GUniverse g) => GUniverse (f :*: g) where
    guniverse = runOmega $ liftM2 (:*:) ls rs
        where ls = each (guniverse :: [f p])
              rs = each (guniverse :: [g p])

instance (GUniverse f, GUniverse g) => GUniverse (f :+: g) where
    guniverse = runOmega $ (fmap L1 $ ls) `mplus` (fmap R1 $ rs)
        where ls = each (guniverse :: [f p])
              rs = each (guniverse :: [g p])

instance (Generic a, GUniverse (Rep a)) => Universe a where
    universe = fmap to $ (guniverse :: [Rep a x])


data T = A | B T | C T T deriving (Show, Generic)
data Tree a = Leaf a | Branch (Tree a) (Tree a) deriving (Show, Generic)

I couldn't find a way to remove UndecidableInstances, but that should be of no greater concern. OverlappingInstances is only required to override predefined Universe instances, like Either's. Now some nice outputs:

*Main> take 10 $ (universe :: [T])
[A,B A,B (B A),C A A,B (B (B A)),C A (B A),B (C A A),C (B A) A,B (B (B (B A))),C A (B (B A))]
*Main> take 20 $ (universe :: [Either Int Char])
[Left (-9223372036854775808),Right '\NUL',Left (-9223372036854775807),Right '\SOH',Left (-9223372036854775806),Right '\STX',Left (-9223372036854775805),Right '\ETX',Left (-9223372036854775804),Right '\EOT',Left (-9223372036854775803),Right '\ENQ',Left (-9223372036854775802),Right '\ACK',Left (-9223372036854775801),Right '\a',Left (-9223372036854775800),Right '\b',Left (-9223372036854775799),Right '\t']
*Main> take 10 $ (universe :: [Tree Bool])
[Leaf False,Leaf True,Branch (Leaf False) (Leaf False),Branch (Leaf False) (Leaf True),Branch (Leaf True) (Leaf False),Branch (Leaf False) (Branch (Leaf False) (Leaf False)),Branch (Leaf True) (Leaf True),Branch (Branch (Leaf False) (Leaf False)) (Leaf False),Branch (Leaf False) (Branch (Leaf False) (Leaf True)),Branch (Leaf True) (Branch (Leaf False) (Leaf False))]

I'm not exactly sure what happens in the branching order of mplus, but I think it should all work out if Omega is correctly implemented, which I strongly believe.


But wait! The above implementation is not yet bug-free; it diverges on "left recursive" types, like this:

data T3 = T3 T3 | T3' deriving (Show, Generic)

while this works:

data T6 = T6' | T6 T6 deriving (Show, Generic)

I'll see if I can fix that. EDIT: At some time, the solution of this problem might be found in this question.

like image 36
phipsgabler Avatar answered Nov 12 '22 14:11

phipsgabler


You really should show us what you have tried so far. But granted, this is not an easy problem for a bgeinner.

Let's try to write a naive version down:

enum = A : (map B enum ++ [ C x y | x <- enum, y <- enum ])

Ok, this actually gives us:

[A, B A, B (B A), B (B (B A)), .... ]

and never reaches the C values.

We obviously need to construct the list in steps. Say we already have a complete list of items up to a certain nesting level, we can compute the items with one nesting level more in one step:

step xs = map B xs ++ [ C x y | x <- xs, y <- xs ]

For example, we get:

> step [A]
[B A,C A A]
> step (step [A])
[B (B A),B (C A A),C (B A) (B A),C (B A) (C A A),C (C A A) (B A),C (C A A) (C A ...

What we want is thus:

[A] ++ step [A] ++ step (step [A]) ++ .....

which is the concatenation of the result of

iterate step [A]

which is, of course

someT = concat (iterate step [A])

Warning: You will notice that this still does not give all values. For example:

C A (B (B A))

will be missing.

Can you find out why? Can you improve it?

like image 26
Ingo Avatar answered Nov 12 '22 15:11

Ingo