Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorted Word frequency count using python

I have to count the word frequency in a text using python. I thought of keeping words in a dictionary and having a count for each of these words.

Now if I have to sort the words according to # of occurrences. Can i do it with same dictionary instead of using a new dictionary which has the key as the count and array of words as the values ?

like image 714
AlgoMan Avatar asked Nov 03 '10 14:11

AlgoMan


People also ask

How do you count the frequency of a word in Python?

Use set() method to remove a duplicate and to give a set of unique words. Iterate over the set and use count function (i.e. string. count(newstring[iteration])) to find the frequency of word at each iteration.

How do you count occurrences of each word in a string in Python?

Python Code:def word_count(str): counts = dict() words = str. split() for word in words: if word in counts: counts[word] += 1 else: counts[word] = 1 return counts print( word_count('the quick brown fox jumps over the lazy dog. '))

Does sorted work on dictionary Python?

Well, as of python 3.7, dictionaries remember the order of items inserted as well. Thus we are also able to sort dictionaries using python's built-in sorted() function. Just like with other iterables, we can sort dictionaries based on different criteria depending on the key argument of the sorted() function.


2 Answers

WARNING: This example requires Python 2.7 or higher.

Python's built-in Counter object is exactly what you're looking for. Counting words is even the first example in the documentation:

>>> # Tally occurrences of words in a list >>> from collections import Counter >>> cnt = Counter() >>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']: ...     cnt[word] += 1 >>> cnt Counter({'blue': 3, 'red': 2, 'green': 1}) 

As specified in the comments, Counter takes an iterable, so the above example is merely for illustration and is equivalent to:

>>> mywords = ['red', 'blue', 'red', 'green', 'blue', 'blue'] >>> cnt = Counter(mywords) >>> cnt Counter({'blue': 3, 'red': 2, 'green': 1}) 
like image 103
jathanism Avatar answered Sep 26 '22 13:09

jathanism


You can use the same dictionary:

>>> d = { "foo": 4, "bar": 2, "quux": 3 } >>> sorted(d.items(), key=lambda item: item[1]) 

The second line prints:

[('bar', 2), ('quux', 3), ('foo', 4)] 

If you only want a sorted word list, do:

>>> [pair[0] for pair in sorted(d.items(), key=lambda item: item[1])] 

That line prints:

['bar', 'quux', 'foo'] 
like image 25
Frédéric Hamidi Avatar answered Sep 25 '22 13:09

Frédéric Hamidi