Parsing an expression in Prolog and returning an abstract syntax

Question

I have to write parse(Tkns, T) that takes in a mathematical expression in the form of a list of tokens and finds T, and return a statement representing the abstract syntax, respecting order of operations and associativity.

For example,

?- parse( [ num(3), plus, num(2), star, num(1) ], T ).

T = add(integer(3), multiply(integer(2), integer(1))) ;
No

I've attempted to implement + and * as follows

parse([num(X)], integer(X)).
parse(Tkns, T) :-
  (  append(E1, [plus|E2], Tkns),
     parse(E1, T1),
     parse(E2, T2),
     T = add(T1,T2)
  ;  append(E1, [star|E2], Tkns),
     parse(E1, T1),
     parse(E2, T2),
     T = multiply(T1,T2)
  ).

Which finds the correct answer, but also returns answers that do not follow associativity or order of operations.

ex)

parse( [ num(3), plus, num(2), star, num(1) ], T ).

also returns

mult(add(integer(3), integer(2)), integer(1))

and

parse([num(1), plus, num(2), plus, num(3)], T)

returns the equivalent of 1+2+3 and 1+(2+3) when it should only return the former.

Is there a way I can get this to work?

Edit: more info: I only need to implement +,-,*,/,negate (-1, -2, etc.) and all numbers are integers. A hint was given that the code will be structured similarly to the grammer

<expression> ::= <expression> + <term>
              |  <expression> - <term>
              |  <term>

      <term> ::= <term> * <factor>
              |  <term> / <factor>
              |  <factor>

    <factor> ::= num
              |  ( <expression> )

Only with negate implemented as well.

Edit2: I found a grammar parser written in Prolog (http://www.cs.sunysb.edu/~warren/xsbbook/node10.html). Is there a way I could modify it to print a left hand derivation of a grammar ("print" in the sense that the Prolog interpreter will output "T=[the correct answer]")

CapelliC · Accepted Answer

Removing left recursion will drive you towards DCG based grammars.

But there is an interesting alternative way: implement bottom up parsing.

How hard is this in Prolog ? Well, as Pereira and Shieber show in their wonderful book 'Prolog and Natural-Language Analysis', can be really easy: from chapter 6.5

Prolog supplies by default a top-down, left-to-right, backtrack parsing algorithm for DCGs.

It is well known that top-down parsing algorithms of this kind will loop on left-recursive rules (cf. the example of Program 2.3).

Although techniques are avail- able to remove left recursion from context-free grammars, these techniques are not readily generalizable to DCGs, and furthermore they can increase grammar size by large factors.

As an alternative, we may consider implementing a bottom-up parsing method directly in Prolog. Of the various possibilities, we will consider here the left-corner method in one of its adaptations to DCGs.

For programming convenience, the input grammar for the left-corner DCG interpreter is represented in a slight variation of the DCG notation. The right-hand sides of rules are given as lists rather than conjunctions of literals. Thus rules are unit clauses of the form, e.g.,

s ---> [np, vp].

or

optrel ---> [].

Terminals are introduced by dictionary unit clauses of the form word(w,PT).

Consider to complete the lecture before proceeding (lookup the free book entry by title in info page).

Now let's try writing a bottom up processor:

:- op(150, xfx, ---> ).

parse(Phrase) -->
    leaf(SubPhrase),
    lc(SubPhrase, Phrase).

leaf(Cat) --> [Word], {word(Word,Cat)}.
leaf(Phrase) --> {Phrase ---> []}.

lc(Phrase, Phrase) --> [].

lc(SubPhrase, SuperPhrase) -->
    {Phrase ---> [SubPhrase|Rest]},
    parse_rest(Rest),
    lc(Phrase, SuperPhrase).

parse_rest([]) --> [].
parse_rest([Phrase|Phrases]) -->
    parse(Phrase),
    parse_rest(Phrases).

% that's all! fairly easy, isn't it ?

% here start the grammar: replace with your one, don't worry about Left Recursion
e(sum(L,R)) ---> [e(L),sum,e(R)].
e(num(N)) ---> [num(N)].

word(N, num(N)) :- integer(N).
word(+, sum).

that for instance yields

phrase(parse(P), [1,+,3,+,1]).
P = e(sum(sum(num(1), num(3)), num(1)))

note the left recursive grammar used is e ::= e + e | num

false · Answer

Before fixing your program, look at how you identified the problem! You assumed that a particular sentence will have exactly one syntax tree, but you got two of them. So essentially, Prolog helped you to find the bug!

This is a very useful debugging strategy in Prolog: Look at all the answers.

Next is the specific way how you encoded the grammar. In fact, you did something quite smart: You essentially encoded a left-recursive grammar - nevertheless your program terminates for a list of fixed length! That's because you indicate within each recursion that there has to be at least one element in the middle serving as operator. So for each recursion there has to be at least one element. That is fine. However, this strategy is inherently very inefficient. For, for each application of the rule, it will have to consider all possible partitions.

Another disadvantage is that you can no longer generate a sentence out of a syntax tree. That is, if you use your definition with:

?- parse(S, add(add(integer(1),integer(2)),integer(3))).

There are two reasons: The first is that the goals T = add(...,...) are too late. Simply put them at the beginning in front of the append/3 goals. But much more interesting is that now append/3 does not terminate. Here is the relevant failure-slice (see the link for more on this).

parse([num(X)], integer(X)) :- false.
parse(Tkns, T) :-
  (  T = add(T1,T2),
     append(E1, [plus|E2], Tkns), false,
     parse(E1, T1),
     parse(E2, T2),
  ;  false, T = multiply(T1,T2),
     append(E1, [star|E2], Tkns),
     parse(E1, T1),
     parse(E2, T2),     
  ).

@DanielLyons already gave you the "traditional" solution which requires all kinds of justification from formal languages. But I will stick to your grammar you encoded in your program which - translated into DCGs - reads:

expr(integer(X)) --> [num(X)].
expr(add(L,R)) --> expr(L), [plus], expr(R).
expr(multiply(L,R)) --> expr(L), [star], expr(R).

When using this grammar with ?- phrase(expr(T),[num(1),plus,num(2),plus,num(3)]). it will not terminate. Here is the relevant slice:

expr(integer(X)) --> {false}, [num(X)].
expr(add(L,R)) --> expr(L), {false}, [plus], expr(R).
expr(multiply(L,R)) --> {false}expr(L), [star], expr(R).

So it is this tiny part that has to be changed. Note that the rule "knows" that it wants one terminal symbol, alas, the terminal appears too late. If only it would occur in front of the recursion! But it does not.

There is a general way how to fix this: Add another pair of arguments to encode the length.

parse(T, L) :-
   phrase(expr(T, L,[]), L).

expr(integer(X), [_|S],S) --> [num(X)].
expr(add(L,R), [_|S0],S) --> expr(L, S0,S1), [plus], expr(R, S1,S).
expr(multiply(L,R), [_|S0],S) --> expr(L, S0,S1), [star], expr(R, S1,S).

This is a very general method that is of particular interest if you have ambiguous grammars, or if you do not know whether or not your grammar is ambiguous. Simply let Prolog do the thinking for you!

Parsing an expression in Prolog and returning an abstract syntax

Tags:

parsing

prolog

dcg

failure-slice

user3089588

2 Answers

CapelliC

false

Recent Activity

Donate For Us

Parsing an expression in Prolog and returning an abstract syntax

Tags:

parsing

prolog

dcg

failure-slice

user3089588

2 Answers

CapelliC

false

Related questions

Recent Activity

Donate For Us