Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse boolean arithmetic including parentheses with regex?

Is there a single regular expression that can parse a string (in Python and/or Javascript, does not need to be the same expression) that represents simple boolean arithmetic? For example I want to parse this string:

a and (b and c) and d or e and (f or g)

Assuming that:
* parentheses do not nest
* the terms a, b, ..., z are not sub-expressions

The resulting captures should be grouped by parentheses first, which I then parse again with the same or a simpler regex.

I've had success writing a naive regex for parsing boolean arithmetic without parentheses.

Any ideas?

like image 241
nikola Avatar asked Jan 22 '10 15:01

nikola


1 Answers

Normally you would use for example a recursive descent parser for this task, but you can grab all the parts (tokens) with a regex:

x = 'a and (b and c) and d or e and (f or g)'
import re

matches = re.findall(r'\(.*?\)|\w+', x)
print ','.join(matches)

The operators usually have different precedence. Parentheses would be evaluated first, then and expressions, and finally or expressions, with left-to-right order in case of equal precedence. You say you want to return the parentheses matches first, but actually what you would normally do is to use the parts build an expression tree and evalulate that recursively.

like image 147
Mark Byers Avatar answered Oct 18 '22 22:10

Mark Byers