I have strings like this:
"MSE 2110, 3030, 4102"
I would like to output:
[("MSE", 2110), ("MSE", 3030), ("MSE", 4102)]
This is my way of going about it, although I haven't quite gotten it yet:
def makeCourseList(str, location, tokens):
print "before: %s" % tokens
for index, course_number in enumerate(tokens[1:]):
tokens[index + 1] = (tokens[0][0], course_number)
print "after: %s" % tokens
course = Group(DEPT_CODE + COURSE_NUMBER) # .setResultsName("Course")
course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)).setParseAction(makeCourseList)
This outputs:
>>> course.parseString("CS 2110")
([(['CS', 2110], {})], {})
>>> course_data.parseString("CS 2110, 4301, 2123, 1110")
before: [['CS', 2110], 4301, 2123, 1110]
after: [['CS', 2110], ('CS', 4301), ('CS', 2123), ('CS', 1110)]
([(['CS', 2110], {}), ('CS', 4301), ('CS', 2123), ('CS', 1110)], {})
Is this the right way to do it, or am I totally off?
Also, the output of isn't quite correct - I want course_data
to emit a list of course
symbols that are in the same format as each other. Right now, the first course is different from the others. (It has a {}
, whereas the others don't.)
This solution memorizes the department when parsed, and emits a (dept,coursenum) tuple when a number is found.
from pyparsing import Suppress,Word,ZeroOrMore,alphas,nums,delimitedList
data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000
'''
def memorize(t):
memorize.dept = t[0]
def token(t):
return (memorize.dept,int(t[0]))
course = Suppress(Word(alphas).setParseAction(memorize))
number = Word(nums).setParseAction(token)
line = course + delimitedList(number)
lines = ZeroOrMore(line)
print lines.parseString(data)
Output:
[('MSE', 2110), ('MSE', 3030), ('MSE', 4102), ('CSE', 1000), ('CSE', 2000), ('CSE', 3000)]
Is this the right way to do it, or am I totally off?
It's one way to do it, though of course there are others (e.g. use as parse actions two bound method -- so the instance the method belongs to can keep state -- one for the dept code and another for the course number).
The return value of the parseString
call is harder to bend to your will (though I'm sure sufficiently dark magic will do it and I look forward to Paul McGuire explaining how;-), so why not go the bound-method route as in...:
from pyparsing import *
DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")
class MyParse(object):
def __init__(self):
self.result = None
def makeCourseList(self, str, location, tokens):
print "before: %s" % tokens
dept = tokens[0][0]
newtokens = [(dept, tokens[0][1])]
newtokens.extend((dept, tok) for tok in tokens[1:])
print "after: %s" % newtokens
self.result = newtokens
course = Group(DEPT_CODE + COURSE_NUMBER).setResultsName("Course")
inst = MyParse()
course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)
).setParseAction(inst.makeCourseList)
ignore = course_data.parseString("CS 2110, 4301, 2123, 1110")
print inst.result
this emits:
before: [['CS', '2110'], '4301', '2123', '1110']
after: [('CS', '2110'), ('CS', '4301'), ('CS', '2123'), ('CS', '1110')]
[('CS', '2110'), ('CS', '4301'), ('CS', '2123'), ('CS', '1110')]
which seems to be what you require, if I read your specs correctly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With