Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyParsing : how to use SkipTo and OR(^) operator

I have different formats of date prefixes and other prefixes. I needed to create a grammar which can skip this prefixes and obtain the required data. But , when I use SkipTo and Or(^) operator , I am not able to get the desired results.

from pyparsing import *
import pprint
def print_cal(v):
    print v

f=open("test","r")
NAND_TIME= Group(SkipTo(Literal("NAND TIMES"),include=True) + Word(nums)+Literal(":").suppress()+Word(nums)).setParseAction(lambda t: print_cal('NAND TIME'))

TEST_TIME= Group(SkipTo(Literal("TEST TIMES"),include=True) + Word(nums)+Literal(":").suppress()+Word(nums)).setParseAction(lambda t: print_cal('TEST TIME'))

testing =NAND_TIME ^ TEST_TIME
watch=OneOrMore(testing)
watch.parseString(f.read())

File Contents:


01 may 2015 15:15:100 NAND TIMES 1: 88008888

01 april 2015 15:15:100 NAND TIMES 2: 77777777

1154544 15:15:100 TEST TIMES 1: 78544545

8787878 aug 2015 15:15:100 TEST TIMES 2: 78787878

OUTPUT :

    
TEST TIME

TEST TIME

Desired output :

  
NAND TIME

NAND TIME

TEST TIME

TEST TIME

Can anyone help me understand this ?

like image 850
Praneeth Puligundla Avatar asked Aug 08 '14 19:08

Praneeth Puligundla


1 Answers

Using SkipTo as the first element of a parser is a bit bold, and may indicate that searchString or scanString would be a better choice than parseString (searchString and scanString allow you to define just the part of the input that you are interested in, and the rest will be skipped over automatically - but you have to take care that your definition of "what you want" is unambiguous and does not accidentally pick up unwanted bits.) Here is your parser implemented using searchString:

NAND_TIME= (Literal("NAND TIMES") + Word(nums)+Literal(":").suppress()+Word(nums)).setParseAction(lambda t: print_cal('NAND TIME'))
TEST_TIME= (Literal("TEST TIMES") + Word(nums)+Literal(":").suppress()+Word(nums)).setParseAction(lambda t: print_cal('TEST TIME'))
testing =NAND_TIME | TEST_TIME

testdata = f.read()
for match in testing.searchString(testdata):
    print match.asList()

'|' is perfectly fine to use in this case, as there is no possible confusion between starting with NAND or starting with TEST.

You might also consider just parsing this file a line at a time:

for line in f:
    if not line: continue
    print line    
    print testing.searchString(line).asList()
    print
like image 86
PaulMcG Avatar answered Oct 16 '22 05:10

PaulMcG