Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyparsing Regex as Word

I'm building a syntax parser to perform simple actions on objects identified using dotted notation, something like this:

DISABLE ALL;
ENABLE A.1 B.1.1 C

but in DISABLE ALL the keyword ALL is instead matched as 3 Regex(r'[a-zA-Z]') => 'A', 'L', 'L' I use to match arguments.

How can I make a Word using regex? AFAIK I can't get A.1.1 using Word

please see example below

import pyparsing as pp

def toggle_item_action(s, loc, tokens):
    'enable / disable a sequence of items'
    action = True if tokens[0].lower() == "enable" else False
    for token in tokens[1:]:
        print "it[%s].active = %s" % (token, action)

def toggle_all_items_action(s, loc, tokens):
    'enable / disable ALL items'
    action = True if tokens[0].lower() == "enable" else False
    print "it.enable_all(%s)" % action

expr_separator = pp.Suppress(';')

#match A
area = pp.Regex(r'[a-zA-Z]')
#match A.1
category = pp.Regex(r'[a-zA-Z]\.\d{1,2}')
#match A.1.1
criteria = pp.Regex(r'[a-zA-Z]\.\d{1,2}\.\d{1,2}')
#match any of the above
item = area ^ category ^ criteria
#keyword to perform action on ALL items
all_ = pp.CaselessLiteral("all")

#actions
enable = pp.CaselessKeyword('enable')
disable = pp.CaselessKeyword('disable')
toggle = enable | disable

#toggle item expression
toggle_item = (toggle + item + pp.ZeroOrMore(item)
    ).setParseAction(toggle_item_action)

#toggle ALL items expression
toggle_all_items = (toggle + all_).setParseAction(toggle_all_items_action)

#swapping order to `toggle_all_items ^ toggle_item` works
#but seems to weak to me and error prone for future maintenance
expr = toggle_item ^ toggle_all_items
#expr = toggle_all_items ^ toggle_item

more = expr + pp.ZeroOrMore(expr_separator + expr)

more.parseString("""
    ENABLE A.1 B.1.1;
    DISABLE ALL
    """, parseAll=True)
like image 448
neurino Avatar asked Oct 10 '22 19:10

neurino


1 Answers

Is this the problem?

#match any of the above
item = area ^ category ^ criteria
#keyword to perform action on ALL items
all_ = pp.CaselessLiteral("all")

Should be:

#keyword to perform action on ALL items
all_ = pp.CaselessLiteral("all")
#match any of the above
item = area ^ category ^ criteria ^ all_

EDIT - if you're interested...

Your regexes are so similar, I thought I'd see what it would look like to combine them into one. Here is a snippet to parse out your three dotted notations using a single Regex, and then using a parse action to figure out which type you got:

import pyparsing as pp

dotted_notation = pp.Regex(r'[a-zA-Z](\.\d{1,2}(\.\d{1,2})?)?') 
def name_notation_type(tokens):
    name = {
        0 : "area",
        1 : "category",
        2 : "criteria"}[tokens[0].count('.')]
    # assign results name to results - 
    tokens[name] = tokens[0] 
dotted_notation.setParseAction(name_notation_type)

# test each individually
tests = "A A.1 A.2.2".split()
for t in tests:
    print t
    val = dotted_notation.parseString(t)
    print val.dump()
    print val[0], 'is a', val.getName()
    print

# test all at once
tests = "A A.1 A.2.2"
val = pp.OneOrMore(dotted_notation).parseString(tests)
print val.dump()

Prints:

A
['A']
- area: A
A is a area

A.1
['A.1']
- category: A.1
A.1 is a category

A.2.2
['A.2.2']
- criteria: A.2.2
A.2.2 is a criteria

['A', 'A.1', 'A.2.2']
- area: A
- category: A.1
- criteria: A.2.2

EDIT2 - I see the original problem...

What is messing you up is pyparsing's implicit whitespace skipping. Pyparsing will skip over whitespace between defined tokens, but the converse is not true - pyparsing does not require whitespace between separate parser expressions. So in your all_-less version, "ALL" looks like 3 areas, "A", "L", and "L". This is true not just of Regex, but just about any pyparsing class. See if the pyparsing WordEnd class might be useful in enforcing this.

EDIT3 - Then maybe something like this...

toggle_item = (toggle + pp.OneOrMore(item)).setParseAction(toggle_item_action)
toggle_all = (toggle + all_).setParseAction(toggle_all_action)

toggle_directive = toggle_all | toggle_item

The way your commands are formatted, you have to make the parser first see if ALL is being toggled before looking for individual areas, etc. If you need to support something that might read "ENABLE A.1 ALL", then use a negative lookahead for item: item = ~all_ + (area ^ etc...). (Note also that I replaced item + pp.ZeroOrMore(item) with just pp.OneOrMore(item).)

like image 170
PaulMcG Avatar answered Nov 01 '22 18:11

PaulMcG