Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Eliminate Left Recursion on this PEG.js grammar

(Note: I've read other questions like this, but I haven't been able to figure this out).

I wrote this grammar:

start = call

ident = [a-z]+
spaces = [ ]+

call = f:ident spaces g:(call / ident) {
    return f + "(" + g + ")";
}

With this input

a b c d

it returns

"a(b(c(d)))"

And I want

"a(b)(c)(d)"

I think this left recursive rule can give me something like that, but PEG.js doesn't support left recursion.

call = f:(call / ident) spaces g:ident {
    return f + "(" + g + ")";
}

How can I eliminate the left recursion in this case?

PS: You can test this on the online PEG.js demo

like image 861
user1527166 Avatar asked Nov 19 '12 15:11

user1527166


People also ask

How do you eliminate the left recursion in the grammar?

A Grammar G (V, T, P, S) is left recursive if it has a production in the form. A → A α |β. Left Recursion can be eliminated by introducing new non-terminal A such that.

Why is left recursion eliminated in grammar?

Removing left recursion Left recursion often poses problems for parsers, either because it leads them into infinite recursion (as in the case of most top-down parsers) or because they expect rules in a normal form that forbids it (as in the case of many bottom-up parsers, including the CYK algorithm).

Which one of the grammar is free from left recursion?

Which one of the following grammars is free from left recursion? Explanation: Grammar A has direct left recursion because of the production rule: A->Aa. Grammar B doesn't have any left recursion (neither direct nor indirect).

Can a grammar be left and right recursive?

A production can be recursive without being either left- or right-recursive, and it can even be both left- and right-recursive. The above examples are both direct recursion; the non-terminal directly derives a sequence containing the non-terminal. Recursion can also be indirect; it is still recursion.


2 Answers

Good question. Start by separating your first ident from everything else, since it gets special treatment (no parentheses). Next, defer to a rule to handle the spaces ident recursion that will collect the values that go inside parentheses. The loop wraps the ident text and appends any new text that is collected recursively.

Here is a short-hand version of the rules (note that tail is a separate rule):

head: ident tail?;        //the "head" ident is separated
tail: spaces ident tail?; //each "tail" ident is looped over

Here is the PEG script:

start = head

ident = [a-z]+
spaces = [ ]+

head = head:ident tail:tail? {
    return head + tail;
}

tail = spaces next:ident tail:tail? {
    return "(" + next + ")" + tail
}

Edit: Here is an alternative that does the work in one rule and is more similar to yours.

start = head

ident = [a-z]+
spaces = [ ]+

head = head:ident tail:(spaces next:ident{return "(" + next + ")" })* {
    return head + tail.join("")
}

The output for a b c d is "a(b)(c)(d)" for both scripts.

like image 168
user1201210 Avatar answered Oct 06 '22 21:10

user1201210


If I understand correctly, your problem isn't left recursion, it's the structure of the parse tree.

You've eliminated the left recursion properly, but unfortunately, the only way to get rid of left recursion is by eliminating left recursion in the raw parse tree. Most of the theory for this stuff is just about matching the right set of strings. You are still matching the same set of strings, so the theory is happy, but you wanted a left recursive parse tree. More on that problem on wikipedia.

AFAIK, You can't get the raw output of the PEG parser to be left-recursive. You can, however, do whatever you want with the output. So parse it as an array and then postprocess to give it that nice left-structure.

Doing it with a simplified (no spaces, no multicharacter identifiers) grammar:

 start = call

 id = [a-z]

 call
     = arr:id+ {
         var acc = arr[0]
         for (i = 1; i < arr.length; i++) {
            acc = [acc, arr[i]]
         }
         return acc;
     }

That parses abcd to [ [ [ 'a', 'b' ], 'c' ], 'd' ]. I just used + instead of recursion, and then ran through the resulting array building the structure we wanted. Wikipedia has some notes on doing left recursion with a PEG.

That's assuming you wanted the data structure. If you just want the parens, replace the action with this:

var acc = arr[0]
for (i = 1; i < arr.length; i++) {
    acc = acc + '(' + arr[i] + ')'
}
return acc;

Which gives a(b)(c)(d).

To put spaces and multicharacter id's back, you can do this:

 start = call

 id = [a-z]+

 _ = [ ]+

 call
      = a:id as:arg* {
          arr = [a].concat(as)
          var acc = arr[0]
          for (i = 1; i < arr.length; i++) {
              acc = acc + '(' + arr[i] + ')'
          }
          return acc;
      }

 arg = _ a:id {return a}
like image 39
WolfTivy Avatar answered Oct 06 '22 22:10

WolfTivy