consider
doc = ["i am a fellow student", "we both are the good student", "a student works hard"]
I have this as input I just wanted to print the number of times each word in the whole list occurs:
For example student occurs 3 times so expected output student=3, a=2,etc
I was able to print the unique words in the doc, but not able to print the occurrences. Here is the function i used:
def fit(doc):
unique_words = set()
if isinstance(dataset, (list,)):
for row in dataset:
for word in row.split(" "):
if len(word) < 2:
continue
unique_words.add(word)
unique_words = sorted(list(unique_words))
return (unique_words)
doc=fit(docs)
print(doc)
['am', 'are', 'both', 'fellow', 'good', 'hard', 'student', 'the', 'we', 'works']
I got this as output I just want the number of occurrences of the unique_words. How do i do this please?
You just need to use Counter
, and you will solve the problem by using a single line of code:
from collections import Counter
doc = ["i am a fellow student",
"we both are the good student",
"a student works hard"]
count = dict(Counter(word for sentence in doc for word in sentence.split()))
count
is your desired dictionary:
{
'i': 1,
'am': 1,
'a': 2,
'fellow': 1,
'student': 3,
'we': 1,
'both': 1,
'are': 1,
'the': 1,
'good': 1,
'works': 1,
'hard': 1
}
So for example count['student'] == 3
, count['a'] == 2
etc.
Here it's important to use split()
instead of split(' ')
: in this way you will not end up with having an "empty" word within count
. Example:
>>> sentence = "Hello world"
>>> dict(Counter(sentence.split(' ')))
{'Hello': 1, '': 4, 'world': 1}
>>> dict(Counter(sentence.split()))
{'Hello': 1, 'world': 1}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With