Pyparsing

Question

I want to be able to pull out the type and count of letters from a piece of text where the letters could be in any order. There is some other parsing going on which I have working, but this bit has me stumped!

input -> result
"abc" -> [['a',1], ['b',1],['c',1]]
"bbbc" -> [['b',3],['c',1]]
"cccaa" -> [['a',2],['c',3]]

I could use search or scan and repeat for each possible letter, but is there a clean way of doing it?

This is as far as I got:

from pyparsing import *


def handleStuff(string, location, tokens):

        return [tokens[0][0], len(tokens[0])]


stype = Word("abc").setParseAction(handleStuff)
section =  ZeroOrMore(stype("stype"))


print section.parseString("abc").dump()
print section.parseString("aabcc").dump()
print section.parseString("bbaaa").dump()

PaulMcG · Accepted Answer

I wasn't clear from your description whether the input characters could be mixed like "ababc", since in all your test cases, the letters were always grouped together. If the letters are always grouped together, you could use this pyparsing code:

def makeExpr(ch):
    expr = Word(ch).setParseAction(lambda tokens: [ch,len(tokens[0])])
    return expr

expr = Each([Optional(makeExpr(ch)) for ch in "abc"])

for t in tests:
    print t,expr.parseString(t).asList()

The Each construct takes care of matching out of order, and Word(ch) handles the 1-to-n repetition. The parse action takes care of converting the parsed tokens into the (character, count) tuples.

Lennart Regebro · Answer

One solution:

text = 'sufja srfjhvlasfjkhv lasjfvhslfjkv hlskjfvh slfkjvhslk'
print([(x,text.count(x)) for x in set(text)])

No pyparsing involved, but it seems like overkill.

mechanical_meat · Answer

I like Lennart's one-line solution.

Alex mentions another great option if you're using 3.1

Yet another option is collections.defaultdict:

>>> from collections import defaultdict
>>> mydict = defaultdict(int)
>>> for c in 'bbbc':
...   mydict[c] += 1
...
>>> mydict
defaultdict(<type 'int'>, {'c': 1, 'b': 3})

Pyparsing - where order of tokens in unpredictable

Tags:

python

PhoebeB

3 Answers

PaulMcG

Lennart Regebro

mechanical_meat

Recent Activity

Donate For Us