Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting occurrences of multiple strings in another string

Tags:

python

count

In Python 2.7, given this string:

Spot is a brown dog. Spot has brown hair. The hair of Spot is brown.

what would be the best way to find the sum amount of "Spot"s, "brown"s, and "hair"s in the string? In the example, it would return 8.

I'm looking for something like string.count("Spot","brown","hair") but works with with the "strings to be found" in a tuple or list.

Thanks!

like image 209
DharmaTurtle Avatar asked Oct 29 '25 03:10

DharmaTurtle


2 Answers

This does what you asked for, but notice that it will also count words like "hairy", "browner" etc.

>>> s = "Spot is a brown dog. Spot has brown hair. The hair of Spot is brown."
>>> sum(s.count(x) for x in ("Spot", "brown", "hair"))
8

You can also write it as a map

>>> sum(map(s.count, ("Spot", "brown", "hair")))
8

A more robust solution might use the nltk package

>>> import nltk  # Natural Language Toolkit
>>> from collections import Counter
>>> sum(x in {"Spot", "brown", "hair"} for x in nltk.wordpunct_tokenize(s))
8
like image 151
John La Rooy Avatar answered Oct 31 '25 18:10

John La Rooy


I might use a Counter:

s = 'Spot is a brown dog. Spot has brown hair. The hair of Spot is brown.'
words_we_want = ("Spot","brown","hair")
from collections import Counter
data = Counter(s.split())
print (sum(data[word] for word in words_we_want))

Note that this will under-count by 1 since 'brown.' and 'brown' are separate Counter entries.

A slightly less elegant solution that doesn't trip up on punctuation uses a regex:

>>> len(re.findall('Spot|brown|hair','Spot is a brown dog. Spot has brown hair. The hair of Spot is brown.'))
8

You can create the regex from a tuple simply by

'|'.join(re.escape(x) for x in words_we_want)

The nice thing about these solutions is that they have a much better algorithmic complexity compared to the solution by gnibbler. Of course, which actually performs better on real world data still needs to be measured by OP (since OP is the only one with the real world data)

like image 37
mgilson Avatar answered Oct 31 '25 19:10

mgilson



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!