Warning: long and complicated question incoming.
Some believe total functional programming is a valuable idea, and so is finding techniques for doing it. Minding that, how can I write a parser for a mutually recursive ADT without recursion and side effects? Here, I'm defining as "recursive" any term that isn't strongly normalizing.
What I have tried:
Mind this following mutually recursive ADT:
data Tree = Node Int [Tree]
tree = Node 10 [Node 20 [], Node 30 [], Node 40 []]
The value, tree
, could be serialized as:
tree_serial = [0,10,0,0,20,1,0,0,30,1,0,0,40,1,1] :: [Int]
Using ints for simplicity, here, 0
denotes the beginning of a Node
or a Cons
cell (depending on the state of the parser), 1 denotes Nil
and the remaining denotes data. We could easily write a parser for it using side effects:
var string = [0,10,0,0,20,1,0,0,30,1,0,0,40,1,1];
function parse(string){
function getInt(){
return string.shift();
};
function parseTree(){
var chr = getInt();
if (chr === 0)
return ["Node",getInt(),parseList()];
};
function parseList(){
var chr = getInt();
if (chr === 0)
return ["Cons",parseTree(),parseList()];
if (chr === 1)
return "Nil";
};
return parseTree();
};
console.log(JSON.stringify(parse(string)));
Here, getInt
is side-effective: it gets the next int from the string. We could easily and elegantly translate this to Haskell using Parsec or similar - but for better understanding, I skipped those and defined a stripped down parser type:
data Parser res = GetInt (Int -> Parser res) | Return res
runParser (GetInt fn) (c:cs) = runParser (fn c) cs
runParser (Return res) c = res
This works similar to a monadic parser, except more explicitly:
main = do
let parsePair = (GetInt (\a -> (GetInt (\b -> Return (a,b)))))
print $ runParser parsePair [1,2,3,4,5]
Using this, we could define our parser without side effects:
data Tree = Node Int [Tree] deriving Show
parseTree = treeParser Return where
treeParser = (\ cont ->
GetInt (\ _ ->
GetInt (\ tag ->
listParser (\ listParsingResult ->
(cont (Node tag listParsingResult))))))
listParser = (\ cont ->
GetInt (\ a ->
if a == 0
then treeParser (\x -> listParser (\y -> cont (x : y)))
else cont []))
main = do
let treeData = [0,10,0,0,20,1,0,0,30,1,0,0,40,1,1]
print $ runParser parseTree treeData
This outputs Node 10 [Node 20 [],Node 30 [],Node 40 []]
, as expected. Notice this still uses recursion, and I had to use cont
to pass the control between the two recursive functions. Now, there are 2 strategies to get rid of recursion I'm aware of:
1. Use folds.
2. Use church numbers for bounded recursion.
Using folds is obviously not viable here, since there is no structure to fold on (we are building it!). If we were parsing a list instead of a Tree, using church numbers would be perfect since they work exactly like the Y-combinator for bounded recursion - and, knowing the length of the list, we could just write toChurch listLength listParser init
. The problem with this case, though, is that there is mutual recursion going on, and it is not obvious which church number to use. We have many layers of lists and trees of unpredictable lengths. Indeed, if we use a big enough church number, it works without recursion, but at the cost of added work. This is one of the last examples of an actually useful program I couldn't replicate "correctly" without recursion. Can it be done?
For completeness, here is a JavaScript program which parses that tree without recursion, but using made-up church numbers:
function runParser(f){return function(str){
var a = f(str[0]);
return a(str.slice(1));
}};
function Const(a){return function(b){return a}};
function toChurch(n){return (function(f){return (function(a){
for (var i=0; i<n; ++i)
a = f(a);
return a;
}) }) };
function parser(get){
return toChurch(50)(function(rec){
return function (res){
return get(function(a){
return [get(function(b){
return toChurch(50)(function(recl){
return function(res){
return get(function(a){
return [
rec(function(a){
return recl(function(b){
return res(["Cons",a,b])
})
}),
res("Nil")][a];
});
};
})(0)(function(x){return res(["Node",b,x])});
})][a];
});
};
})(0)(Const);
};
var string = [0,200,0,0,300,0,0,400,1,0,0,500,1,0,0,500,1,1,0,0,600,0,0,700,1,0,0,800,1,0,0,900,1,1,1];
console.log(JSON.stringify(parser(runParser)(string)));
Notice that 50
constant inside the parser
function: it is completely arbitrary as a bound. I'm not sure there are "right" choices for those which would "fit perfectly" a specific parseable value.
tl;dr: Church-encode your input list, and use that to drive your recursion.
A proper Church-encoding of lists requires RankNTypes
, and looks a bit like this:
{-# LANGUAGE RankNTypes #-}
data List a = List { runList :: forall r. (a -> r -> r) -> r -> r }
instance Show a => Show (List a) where
showsPrec n (List xs) = showsPrec n (xs (:) [])
nilVal :: List a
nilVal = List $ \cons nil -> nil
consVal :: a -> List a -> List a
consVal a (List as) = List $ \cons nil -> cons a (as cons nil)
-- handy for pattern-matching
uncons :: List a -> Maybe (a, List a)
uncons (List xs) = xs cons nil where
cons x Nothing = Just (x, nilVal)
cons x (Just (x', xs)) = Just (x, consVal x' xs)
nil = Nothing
Now we just have to write our parser. I'm really bad with parsing theory, so I threw something awful together. Probably somebody who knows a thing or two about that domain could give you some more principled advice here. I'm going to parse the grammar:
tree -> 0 N list
list -> 0 tree list | 1
My parser state will keep track of what "hole" we're currently parsing. For non-terminals, we'll actually need a stack of the holes. Thus, the terminal holes have one of these forms:
* N list
0 * list
* tree list
*
We'll collapse the last two. Notice that none of these holes have interesting information before them, so we don't need to store anything in a THole
. The non-terminal holes have one of these forms:
0 N *
0 * list
0 tree *
In this case, the hole in the tree formation rule has a number before it that we will need later, and the second kind of hole in the list formation rule has a tree before it that we'll need to keep, so NTHole
will need those in the constructors. Thus:
data Tree = Node Int [Tree]
deriving (Eq, Ord, Read, Show)
data THole
= TreeT0
| TreeT1
| ListT
deriving (Eq, Ord, Read, Show)
data NTHole
= TreeNT Int
| ListNT0
| ListNT1 Tree
deriving (Eq, Ord, Read, Show)
Our current parser state will be the terminal hole we're currently in, and the stack of non-terminal holes we need to fill as rules get reduced.
type RawState = (THole, List NTHole)
initRawState = (TreeT0, nilVal)
...well, except that we have two more states of interest: done with the list, and error.
type State = Maybe (Either RawState Tree)
initState = Just (Left initRawState)
Now we can write a step function that takes a good state and handles it. Again, you would want a parser generator tool to create one of these for you, but this language is small enough that I did it by hand.
stepRaw :: Int -> RawState -> State
stepRaw 0 (TreeT0, xs) = Just (Left (TreeT1, xs))
stepRaw n (TreeT1, xs) = Just (Left (ListT , consVal (TreeNT n) xs))
stepRaw 0 (ListT , xs) = Just (Left (TreeT0, consVal ListNT0 xs))
stepRaw 1 (ListT , xs) = fst (runList xs cons nil) [] where
cons v (f, xs) = flip (,) (consVal v xs) $ case v of
ListNT1 t -> \acc -> f (t:acc)
TreeNT n -> \acc -> let t = Node n acc in case uncons xs of
Nothing -> Just (Right t)
Just (ListNT0, xs) -> Just (Left (ListT, consVal (ListNT1 t) xs))
_ -> Nothing
_ -> \acc -> Nothing
nil = (\acc -> Nothing, nilVal)
stepRaw _ _ = Nothing
step :: Int -> State -> State
step n v = v >>= either (stepRaw n) (const Nothing)
It turns out that this parser actually runs backwards, which is unfortunate but not a fundamental limitation. It was just easier for me to think in this direction. As required, there is no recursion here. We can try it out on your sample List Int
in ghci.
*Main> let x = foldr consVal nilVal [1,1,40,0,0,1,30,0,0,1,20,0,0,10,0]
*Main> runList x step initState
Just (Right (Node 10 [Node 20 [],Node 30 [],Node 40 []]))
I use foldr
to build x
, and foldr
is recursive, so you might scream about that. But we could easily define x
without foldr
; the built-in list syntax is just more convenient to read and write than long chains of consVal
and nilVal
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With