Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python count the number of words in the list of strings [duplicate]

consider

doc = ["i am a fellow student", "we both are the good student", "a student works hard"]

I have this as input I just wanted to print the number of times each word in the whole list occurs:

For example student occurs 3 times so expected output student=3, a=2,etc

I was able to print the unique words in the doc, but not able to print the occurrences. Here is the function i used:

def fit(doc):    
    unique_words = set() 
    if isinstance(dataset, (list,)):
        for row in dataset:
            for word in row.split(" "): 
                if len(word) < 2:
                    continue
                unique_words.add(word)
        unique_words = sorted(list(unique_words))
        return (unique_words)
doc=fit(docs)

print(doc)

['am', 'are', 'both', 'fellow', 'good', 'hard', 'student', 'the', 'we', 'works']

I got this as output I just want the number of occurrences of the unique_words. How do i do this please?

like image 977
tachyon Avatar asked Dec 03 '22 17:12

tachyon


1 Answers

You just need to use Counter, and you will solve the problem by using a single line of code:

from collections import Counter

doc = ["i am a fellow student",
       "we both are the good student",
       "a student works hard"]

count = dict(Counter(word for sentence in doc for word in sentence.split()))

count is your desired dictionary:

{
    'i': 1,
    'am': 1,
    'a': 2,
    'fellow': 1,
    'student': 3,
    'we': 1,
    'both': 1,
    'are': 1,
    'the': 1,
    'good': 1,
    'works': 1,
    'hard': 1
}

So for example count['student'] == 3, count['a'] == 2 etc.

Here it's important to use split() instead of split(' '): in this way you will not end up with having an "empty" word within count. Example:

>>> sentence = "Hello     world"
>>> dict(Counter(sentence.split(' ')))
{'Hello': 1, '': 4, 'world': 1}
>>> dict(Counter(sentence.split()))
{'Hello': 1, 'world': 1}
like image 189
Riccardo Bucco Avatar answered Dec 31 '22 03:12

Riccardo Bucco