I need to parse expressions based on following rules:
name:value
So a typical expression looks like
filter1:45 hello world filter:5454
filter1:45 'hello world' filter:5454
hello world
'hello world' OR filter:43
Here's what I've tried so far:
class BooleanLiteral(Keyword):
grammar = Enum(K("OR"), K("AND"))
class LineFilter(Namespace):
grammar = flag('inverted', "-"), name(), ":", attr('value', word)
class LineExpression(List):
grammar = csl(LineFilter, separator=blank)
With this grammar, I can parse strings like
filter2:32 filter1:3243
From what I understood I can provide csl
function with a list of objects, and the grammar needs to be in that order. However what if I want to parse an object like
filter34:43 hello filter32:3232
OR
filter34:43 OR filter32:3232
How can I say that there are multiple types of objects (filters, expressions, booleans) in an expression? Is that possible with peg?
From your spec in the question and comments, I think your code is close - but you don't want the csl
. I've put the code I think you want below (it may not be the most elegant implementation, but I think it's reasonable). You have to avoid a potential problem that BooleanLiteral
is a subset of StringLiteral
. This meant that you can't make the LineExpression
have
grammar = maybe_some([LineFilter,StringLiteral]), optional(BooleanLiteral)
The result is a list of objects with the correct types according to your spec, I think. I think the crucial bit to emphasise is that you can put in alternatives as a python list
(i.e. [LineFilter,StringLiteral]
means a LineFilter
or a StringLiteral
). The PEG parser will try them in the order they occur, i.e. it will try to match the first and only if it fails will it try the second and so on.
from pypeg2 import *
class BooleanLiteral(Keyword):
# Need to alter keyword regex to allow for quoted literal keywords
K.regex=re.compile(r'"*\w+"*')
grammar = Enum(K('OR'), K('AND'),K(r'"OR"'), K(r'"AND"'))
class LineFilter(Namespace):
grammar = flag('inverted', "-"), name(), ":", attr('value', word)
class StringLiteral(str):
quoted_string = re.compile(r'"[^"]*"')
grammar = [word, quoted_string]
class LineExpression(List):
grammar = maybe_some([(LineFilter,BooleanLiteral),
(StringLiteral,BooleanLiteral),
LineFilter,
StringLiteral])
test_string = ('filter34:43 "My oh my!!" Hello OR '
'filter32:3232 "AND" "Goodbye cruel world"')
k = parse(test_string,LineExpression)
print('Input:')
print(test_string)
print('Parsed output:')
print('==============')
for component in k:
print(component,type(component))
Input:
filter34:43 "My oh my!!" Hello OR filter32:3232 "AND" "Goodbye cruel world"
Parsed output:
==============
LineFilter([], name=Symbol('filter34')) <class '__main__.LineFilter'>
"My oh my!!" <class '__main__.StringLiteral'>
Hello <class '__main__.StringLiteral'>
OR <class '__main__.BooleanLiteral'>
LineFilter([], name=Symbol('filter32')) <class '__main__.LineFilter'>
"AND" <class '__main__.BooleanLiteral'>
"Goodbye cruel world" <class '__main__.StringLiteral'>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With