Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pypeg2 - can this expression be parsed using peg grammar?

I need to parse expressions based on following rules:

  1. An expression can contain a filter object represented as name:value
  2. An expression can contain a string expression
  3. An expression can contain Booleans OR,AND
  4. Everything inside can be quoted

So a typical expression looks like

filter1:45 hello world filter:5454

filter1:45 'hello world' filter:5454

hello world

'hello world' OR filter:43


Here's what I've tried so far:

class BooleanLiteral(Keyword):
    grammar = Enum(K("OR"), K("AND"))

class LineFilter(Namespace):
    grammar = flag('inverted', "-"), name(), ":", attr('value', word)

class LineExpression(List):
    grammar = csl(LineFilter, separator=blank)

With this grammar, I can parse strings like

filter2:32 filter1:3243

From what I understood I can provide csl function with a list of objects, and the grammar needs to be in that order. However what if I want to parse an object like

filter34:43 hello filter32:3232

OR

filter34:43 OR filter32:3232

How can I say that there are multiple types of objects (filters, expressions, booleans) in an expression? Is that possible with peg?

like image 539
Jan Vorcak Avatar asked Nov 26 '15 10:11

Jan Vorcak


1 Answers

From your spec in the question and comments, I think your code is close - but you don't want the csl. I've put the code I think you want below (it may not be the most elegant implementation, but I think it's reasonable). You have to avoid a potential problem that BooleanLiteral is a subset of StringLiteral. This meant that you can't make the LineExpression have

grammar = maybe_some([LineFilter,StringLiteral]), optional(BooleanLiteral)

The result is a list of objects with the correct types according to your spec, I think. I think the crucial bit to emphasise is that you can put in alternatives as a python list (i.e. [LineFilter,StringLiteral] means a LineFilter or a StringLiteral). The PEG parser will try them in the order they occur, i.e. it will try to match the first and only if it fails will it try the second and so on.

from pypeg2 import *

class BooleanLiteral(Keyword):
    # Need to alter keyword regex to allow for quoted literal keywords
    K.regex=re.compile(r'"*\w+"*') 
    grammar = Enum(K('OR'), K('AND'),K(r'"OR"'), K(r'"AND"')) 

class LineFilter(Namespace):
    grammar = flag('inverted', "-"), name(), ":", attr('value', word)

class StringLiteral(str):
     quoted_string = re.compile(r'"[^"]*"')
     grammar = [word, quoted_string]

class LineExpression(List):
    grammar = maybe_some([(LineFilter,BooleanLiteral),
                          (StringLiteral,BooleanLiteral),
                          LineFilter,
                          StringLiteral])

test_string = ('filter34:43 "My oh my!!" Hello OR '
               'filter32:3232 "AND" "Goodbye cruel world"')

k = parse(test_string,LineExpression)

print('Input:')
print(test_string)
print('Parsed output:')
print('==============')
for component in k:
    print(component,type(component))

Output

Input:
filter34:43 "My oh my!!" Hello OR filter32:3232 "AND" "Goodbye cruel world"
Parsed output:
==============
LineFilter([], name=Symbol('filter34')) <class '__main__.LineFilter'>
"My oh my!!" <class '__main__.StringLiteral'>
Hello <class '__main__.StringLiteral'>
OR <class '__main__.BooleanLiteral'>
LineFilter([], name=Symbol('filter32')) <class '__main__.LineFilter'>
"AND" <class '__main__.BooleanLiteral'>
"Goodbye cruel world" <class '__main__.StringLiteral'>
like image 197
J Richard Snape Avatar answered Oct 02 '22 15:10

J Richard Snape