I've just started using pyparsing
this evening and I've built a complex grammar which describes some sources I'm working with very effectively. It was very easy and very powerful. However, I'm having some trouble working with ParsedResults
. I need to be able to iterate over nested tokens in the order that they're found, and I'm finding it a little frustrating. I've abstracted my problem to a simple case:
import pyparsing as pp
word = pp.Word(pp.alphas + ',.')('word*')
direct_speech = pp.Suppress('“') + pp.Group(pp.OneOrMore(word))('direct_speech*') + pp.Suppress('”')
sentence = pp.Group(pp.OneOrMore(word | direct_speech))('sentence')
test_string = 'Lorem ipsum “dolor sit” amet, consectetur.'
r = sentence.parseString(test_string)
print r.asXML('div')
print ''
for name, item in r.sentence.items():
print name, item
print ''
for item in r.sentence:
print item.getName(), item.asList()
as far as I can see, this ought to work? Here is the output:
<div>
<sentence>
<word>Lorem</word>
<word>ipsum</word>
<direct_speech>
<word>dolor</word>
<word>sit</word>
</direct_speech>
<word>amet,</word>
<word>consectetur.</word>
</sentence>
</div>
word ['Lorem', 'ipsum', 'amet,', 'consectetur.']
direct_speech [['dolor', 'sit']]
Traceback (most recent call last):
File "./test.py", line 27, in <module>
print item.getName(), item.asList()
AttributeError: 'str' object has no attribute 'getName'
The XML output seems to indicate that the string is parsed exactly as I would wish, but I can't iterate over the sentence, for example, to reconstruct it.
Is there a way to do what I need to?
Thanks!
edit:
I've been using this:
for item in r.sentence:
if isinstance(item, basestring):
print item
else:
print item.getName(), item
but it doesn't help me all that much, because I can't distinguish different types of string. Here is a slightly expanded example:
word = pp.Word(pp.alphas + ',.')('word*')
number = pp.Word(pp.nums + ',.')('number*')
direct_speech = pp.Suppress('“') + pp.Group(pp.OneOrMore(word | number))('direct_speech*') + pp.Suppress('”')
sentence = pp.Group(pp.OneOrMore(word | number | direct_speech))('sentence')
test_string = 'Lorem 14 ipsum “dolor 22 sit” amet, consectetur.'
r = sentence.parseString(test_string)
for i, item in enumerate(r.sentence):
if isinstance(item, basestring):
print i, item
else:
print i, item.getName(), item
the output is:
0 Lorem
1 14
2 ipsum
3 word ['dolor', '22', 'sit']
4 amet,
5 consectetur.
not too helpful. I can't distinguish between word
and number
, and the direct_speech
element is labelled word
?!
I'm obviously missing something. All I want to do is:
for item in r.sentence:
if (item is a number):
do something
elif (item is a word):
do something else
etc. ...
should I be approaching this differently?
r.sentence
contains a mix of strings and ParseResults, and only ParseResults support getName()
. Have you tried just iterating over r.sentence
? If I print it out using asList(), I get:
['Lorem', 'ipsum', ['dolor', 'sit'], 'amet,', 'consectetur.']
Or this snippet:
for item in r.sentence:
print type(item),item.asList() if isinstance(item,pp.ParseResults) else item
Gives:
<type 'str'> Lorem
<type 'str'> ipsum
<class 'pyparsing.ParseResults'> ['dolor', 'sit']
<type 'str'> amet,
<type 'str'> consectetur.
I'm not sure I answered your question, but does that shed any light on where to go next?
(Welcome to Pyparsing)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With