I want to be able to pull out the type and count of letters from a piece of text where the letters could be in any order. There is some other parsing going on which I have working, but this bit has me stumped!
input -> result
"abc" -> [['a',1], ['b',1],['c',1]]
"bbbc" -> [['b',3],['c',1]]
"cccaa" -> [['a',2],['c',3]]
I could use search or scan and repeat for each possible letter, but is there a clean way of doing it?
This is as far as I got:
from pyparsing import *
def handleStuff(string, location, tokens):
return [tokens[0][0], len(tokens[0])]
stype = Word("abc").setParseAction(handleStuff)
section = ZeroOrMore(stype("stype"))
print section.parseString("abc").dump()
print section.parseString("aabcc").dump()
print section.parseString("bbaaa").dump()
I wasn't clear from your description whether the input characters could be mixed like "ababc", since in all your test cases, the letters were always grouped together. If the letters are always grouped together, you could use this pyparsing code:
def makeExpr(ch):
expr = Word(ch).setParseAction(lambda tokens: [ch,len(tokens[0])])
return expr
expr = Each([Optional(makeExpr(ch)) for ch in "abc"])
for t in tests:
print t,expr.parseString(t).asList()
The Each construct takes care of matching out of order, and Word(ch) handles the 1-to-n repetition. The parse action takes care of converting the parsed tokens into the (character, count) tuples.
One solution:
text = 'sufja srfjhvlasfjkhv lasjfvhslfjkv hlskjfvh slfkjvhslk'
print([(x,text.count(x)) for x in set(text)])
No pyparsing involved, but it seems like overkill.
I like Lennart's one-line solution.
Alex mentions another great option if you're using 3.1
Yet another option is collections.defaultdict:
>>> from collections import defaultdict
>>> mydict = defaultdict(int)
>>> for c in 'bbbc':
... mydict[c] += 1
...
>>> mydict
defaultdict(<type 'int'>, {'c': 1, 'b': 3})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With