Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using parser combinator to parse simple math expression

I'm using parsimmon to parse a simple math expression and I'm failing to parse a simple math expression that follows order of operation (i.e. */ has higher precedence than +-).

Even if you are not familiar with this library please help me solve precedence problem without left recursion and infinite recursion.

Thank you.

I used TypeScript:

"use strict";

// Run me with Node to see my output!

import * as P from "parsimmon";
import {Parser} from "parsimmon";
// @ts-ignore
import util from "util";


///////////////////////////////////////////////////////////////////////

// Use the JSON standard's definition of whitespace rather than Parsimmon's.
let whitespace = P.regexp(/\s*/m);

// JSON is pretty relaxed about whitespace, so let's make it easy to ignore
// after most text.
function token(parser: Parser<string>) {
  return parser.skip(whitespace);
}

// Several parsers are just strings with optional whitespace.
function word(str: string) {
  return P.string(str).thru(token);
}

let MathParser = P.createLanguage({
  expr: r => P.alt(r.sExpr2, r.sExpr1, r.number),
  sExpr1: r => P.seqMap(r.iExpr, P.optWhitespace, r.plusOrMinus, P.optWhitespace, r.expr, (a, s1, b, s2, c) => [a, b, c]),
  sExpr2: r => P.seqMap(r.iExpr, P.optWhitespace, r.multiplyOrDivide, P.optWhitespace, r.expr, (a, s1, b, s2, c) => [a, b, c]),
  iExpr: r => P.alt(r.iExpr, r.number),    // Issue here! this causes infinite recursion
  // iExpr: r => r.number  // this will fix infinite recursion but yields invalid parse

  number: () =>
    token(P.regexp(/[0-9]+/))
      .map(Number)
      .desc("number"),

  plus: () => word("+"),
  minus: () => word("-"),
  plusOrMinus: r => P.alt(r.plus, r.minus),

  multiply: () => word("*"),
  divide: () => word("/"),
  multiplyOrDivide: r => P.alt(r.multiply, r.divide),

  operator: r => P.alt(r.plusOrMinus, r.multiplyOrDivide)
});

///////////////////////////////////////////////////////////////////////

let text = "3 / 4 - 5 * 6 + 5";

let ast = MathParser.expr.tryParse(text);
console.log(util.inspect(ast, {showHidden: false, depth: null}));

This is my repo

like image 575
Node.JS Avatar asked Dec 28 '19 07:12

Node.JS


2 Answers

Currently the grammar implemented by your parser looks like this (ignoring white space):

expr: sExpr2 | sExpr1 | number
sExpr1: iExpr plusOrMinus expr
sExpr2: iExpr multiplyOrDivide expr
// Version with infinite recursion:
iExpr: iExpr | number
// Version without infinite recursion:
iExpr: number

It's pretty easy to see that iExpr: iExpr is a left-recursive production and the cause of your infinite recursion. But even if Parsimmon could handle left-recursion, there just wouldn't be any point in that production. If it didn't mess up the parser, it just wouldn't do anything at all. Just like an equation x = x does not convey any information, a production x: x doesn't make a grammar match anything it didn't match before. It's basically a no-op, but one that breaks parsers that can't handle left recursion. So removing it is definitely the right solution.

With that fixed, you now get wrong parse trees. Specifically you'll get parse trees as if all your operators were of the same precedence and right-associative. Why? Because the left-side of all of your operators is iExpr and that can only match single numbers. So you'll always have leaves as the left child of an operator node and the tree always grows to the right.

An unambiguous grammar to correctly parse left-associative operators can be written like this:

expr: expr (plusOrMinus multExpr)?
multExpr: multExpr (multiplyOrDivide primaryExpr)?
primaryExpr: number | '(' expr ')'

(The | '(' expr ')' part is only needed if you want to allow parentheses, of course)

This would lead to the correct parse trees because there's no way for a multiplication or division to have an unparenthesized addition or subtraction as a child and if there are multiple applications of operators of the same precedence ,such as 1 - 2 - 3, the outer subtraction would contain the inner subtraction as its left child, correctly treating the operator as left-associative.

Now the problem is that this grammar is left recursive, so it's not going to work with Parsimmon. One's first thought might be to change the left recursion to right recursion like this:

expr: multExpr (plusOrMinus expr)?
multExpr: primaryExpr (multiplyOrDivide multExpr)?
primaryExpr: number | '(' expr ')'

But the problem with that is that now 1 - 2 - 3 wrongly associates to the right instead of the left. Instead, the common solution is to remove the recursion altogether (except the one from primaryExpr back to expr of course) and replace it with repetition:

expr: multExpr (plusOrMinus multExpr)*
multExpr: primaryExpr (multiplyOrDivide primaryExpr)*
primaryExpr: number | '(' expr ')'

In Parsimmon you'd implement this using sepBy1. So now instead of having a left operand, an operator and a right operand, you have a left operand and then arbitrarily many operator-operand pairs in an array. You can create a left-growing tree from that by simply iterating over the array in a for-loop.

like image 72
sepp2k Avatar answered Sep 27 '22 17:09

sepp2k


If your want to learn how to deal with left recursion, you can start from https://en.wikipedia.org/wiki/Parsing_expression_grammar or more precisely https://en.wikipedia.org/wiki/Parsing_expression_grammar#Indirect_left_recursion

And then read more about PEG online. But basically a standard way is to use cycles:

Expr    ← Sum
Sum     ← Product (('+' / '-') Product)*
Product ← Value (('*' / '/') Value)*
Value   ← [0-9]+ / '(' Expr ')'

Your can find examples of this grammar everywhere.

If your want to stick to a more pleasant left-recursive grammars, you can read about packrat parser. And find another parser for yourself. Because, I'm no sure, but it looks like parsimmon is not one of them.

If you just what a working code, than you can go to https://repl.it/repls/ObviousNavyblueFiletype

I've implemented the above grammar using parsimmon API

Expr        : r => r.AdditiveExpr,
AdditiveExpr: r => P.seqMap(
  r.MultExpr, P.seq(r.plus.or(r.minus), r.MultExpr).many(),
  left_association
),
MultExpr    : r => P.seqMap(
  r.UnaryExpr, P.seq(r.multiply.or(r.divide), r.UnaryExpr).many(),
  left_association
),
UnaryExpr   : r => P.seq(r.minus, r.UnaryExpr).or(r.PrimaryExpr),
PrimaryExpr : r => P.seqMap(
  r.LPAREN, r.Expr, r.RPAREN, // without parens it won't work
  (lp,ex,rp) => ex // to remove "(" and ")" from the resulting AST
).or(P.digits),

plus        : () => P.string('+').thru(p => p.skip(P.optWhitespace)),
minus       : () => P.string('-').thru(p => p.skip(P.optWhitespace)),
multiply    : () => P.string('*').thru(p => p.skip(P.optWhitespace)),
divide      : () => P.string('/').thru(p => p.skip(P.optWhitespace)),

We also need to use parentheses, or else why would you need recursion? Value ← [0-9]+ will suffice. Without parentheses, there is no need to reference Expr inside grammar. And if we do, it would not consume any input, it would have no sense whatsoever, and it would hang in infinite recursion.

So let's also add:

LPAREN      : () => P.string('(').thru(p => p.skip(P.optWhitespace)),
RPAREN      : () => P.string(')').thru(p => p.skip(P.optWhitespace)),

I also added unary expressions, just for completeness.

And now for the most interesting part - production functions. Without them the result for, let' say:

(3+4+6+(-7*5))

will look like this:

[[["(",[["3",[]],[["+",["4",[]]],["+",["6",[]]],["+",[["(",[[["-","7"],[["*","5"]]],[]],")"],[]]]]],")"],[]],[]]

With them, it'll be:

[[[["3","+","4"],"+","6"],"+",[["-","7"],"*","5"]]]

Much nicer.

So for left associative operators we will need this:

// input (expr1, [[op1, expr2],[op2, expr3],[op3, expr4]])
// -->
// output [[[expr1, op1, expr2], op2, expr3], op3, expr4]
const left_association  = (ex, rest) => {
  // console.log("input", ex, JSON.stringify(rest))
  if( rest.length === 0 ) return ex
  if( rest.length === 1 ) return [ex, rest[0][0], rest[0][1]]
  let node = [ex]
  rest.forEach(([op, ex]) => {
    node[1] = op;
    node[2] = ex;
    node = [node]
  })
  // console.log("output", JSON.stringify(node))
  return node
}
like image 43
x00 Avatar answered Sep 27 '22 18:09

x00