Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pretty Printing AST with Minimal Parentheses

I'm implementing a pretty-printer for a JavaScript AST and I wanted to ask if someone is aware of a "proper" algorithm to automatically parenthesize expressions with minimal parentheses based on operator precedence and associativity. I haven't found any useful material on the google.

What seems obvious is that an operator whose parent has a higher precedence should be parenthesized, e.g.:

(x + y) * z // x + y has lower precedence

However, there are also some operators which are not associative, in which case parentheses are still are needed, e.g.:

x - (y - z) // both operators have the same precedence

I'm wondering what would be the best rule for this latter case. Whether it's sufficient to say that for division and subtraction, the rhs sub-expression should be parenthesized if it has less than or equal precedence.

like image 893
Maxime C. Avatar asked Dec 04 '12 17:12

Maxime C.


2 Answers

I stumbled on your question in search of the answer myself. While I haven't found a canonical algorithm, I have found that, like you say, operator precedence alone is not enough to minimally parenthesize expressions. I took a shot at writing a JavaScript pretty printer in Haskell, though I found it tedious to write a robust parser so I changed the concrete syntax: https://gist.github.com/kputnam/5625856

In addition to precedence, you must take operator associativity into account. Binary operations like / and - are parsed as left associative. However, assignment =, exponentiation ^, and equality == are right associative. This means the expression Div (Div a b) c can be written a / b / c without parentheses, but Exp (Exp a b) c must be parenthesized as (a ^ b) ^ c.

Your intuition is correct: for left-associative operators, if the left operand's expression binds less tightly than its parent, it should be parenthesized. If the right operand's expression binds as tightly or less tightly than its parent, it should be parenthesized. So Div (Div a b) (Div c d) wouldn't require parentheses around the left subexpression, but the right subexpression would: a / b / (c / d).

Next, unary operators, specifically operators which can either be binary or unary, like negation and subtraction -, coercion and addition +, etc might need to be handled on a case-by-case basis. For example Sub a (Neg b) should be printed as a - (-b), even though unary negation binds more tightly than subtraction. I guess it depends on your parser, a - -b may not be ambiguous, just ugly.

I'm not sure how unary operators which can be both prefix and postfix should work. In expressions like ++ (a ++) and (++ a) ++, one of the operators must bind more tightly than the other, or ++ a ++ would be ambiguous. But I suspect even if parentheses aren't needed in one of those, for the sake of readability, you may want to add parentheses anyway.

like image 55
kputnam Avatar answered Sep 28 '22 12:09

kputnam


It depends on the rules for the specific grammar. I think you have it right for operators with different precedence, and right for subtraction and division.

Exponentiation, however, is often treated differently, in that its right hand operand is evaluated first. So you need

 (a ** b) ** c

when c is the right child of the root.

Which way the parenthesization goes is determined by what the grammar rules define. If your grammar is of the form of

exp = sub1exp ;
exp = sub1exp op exp ;
sub1exp = sub1exp ;  
sub1exp = sub1exp op1 sub2exp ;
sub2exp = sub3exp ;
sub2exp = sub3exp op2 sub2exp ;
sub3exp = ....
subNexp = '(' exp ')' ;

with op1 and op2 being non-associative, then you want to parenthesize the right subtree of op1 if the subtree root is also op1, and you want to parenthesize the left subtree of op2 if the left subtree has root op2.

like image 34
Ira Baxter Avatar answered Sep 28 '22 11:09

Ira Baxter