I have a list of tokenized sentences and I want to count the collective occurrence of several words: e.g.:
example_list = (['hey', 'there', 'you', 'how', 'are', 'you'],
['i', 'am', 'fine', 'how', 'about', you],
['i', 'am', 'good'])
Now I want to count how many times the following words occur in each list and append the score in a list
score = []
test = ['hey', 'you']
I try the following code:
for i in range(len(test)):
for j in range(len(example_list)):
score1.append(example_list[j].count(test[i]))
and get the output of:
[1, 0, 0, 2, 1, 0]
whereas I want an output of:
[3, 1, 0]
any ideas?
You could use sum inside a list comprehension:
example_list = (['hey', 'there', 'you', 'how', 'are', 'you'],
['i', 'am', 'fine', 'how', 'about', 'you'],
['i', 'am', 'good'])
test = ['hey', 'you']
score = [sum(s in test for s in lst) for lst in example_list]
print(score)
Output
[3, 1, 0]
Consider using a set if test
is large enough.
You can use Counter
for this task:
from collections import Counter
counters = [Counter(l) for l in example_list]
occurrences = [sum([c[word] for word in test if word in c]) for c in counters]
print(occurrences) # [3, 1, 0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With