Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyParsing non-greedy match

I am trying to parse a partially standardized street address into it's components using pyparsing. I want to non-greedy match a street name that may be N tokens long.

For example:

444 PARK GARDEN LN

Should be parsed into:

number: 444
street: PARK GARDEN
suffix: LN

How would I do this with PyParsing? Here's my initial code:

from pyparsing import *

def main():
    street_number = Word(nums).setResultsName('street_number')
    street_suffix = oneOf("ST RD DR LN AVE WAY").setResultsName('street_suffix')
    street_name = OneOrMore(Word(alphas)).setResultsName('street_name')

    address = street_number + street_name + street_suffix
    result = address.parseString("444 PARK GARDEN LN")
    print result.dump()

if __name__ == '__main__':
    main()

but when I try parsing it, the street suffix gets gobbled up by the default greedy parsing behavior.

like image 331
zzz Avatar asked Apr 10 '13 23:04

zzz


1 Answers

Use the negation, ~, to check to see if the upcoming street_name is actually a street_suffix.

from pyparsing import *

street_number = Word(nums)('street_number')
street_suffix = oneOf("ST RD DR LN AVE WAY")('street_suffix')
street_name = OneOrMore(~street_suffix + Word(alphas))('street_name')

address = street_number + street_name + street_suffix
result = address.parseString("444 PARK GARDEN LN")
print result.dump()

In addition, you don't have to use setResultsName, you can simply use the syntax above. IMHO it leads to a much cleaner grammar definition.

like image 73
Hooked Avatar answered Oct 06 '22 08:10

Hooked