I have to count the word frequency in a text using python. I thought of keeping words in a dictionary and having a count for each of these words.
Now if I have to sort the words according to # of occurrences. Can i do it with same dictionary instead of using a new dictionary which has the key as the count and array of words as the values ?
Use set() method to remove a duplicate and to give a set of unique words. Iterate over the set and use count function (i.e. string. count(newstring[iteration])) to find the frequency of word at each iteration.
Python Code:def word_count(str): counts = dict() words = str. split() for word in words: if word in counts: counts[word] += 1 else: counts[word] = 1 return counts print( word_count('the quick brown fox jumps over the lazy dog. '))
Well, as of python 3.7, dictionaries remember the order of items inserted as well. Thus we are also able to sort dictionaries using python's built-in sorted() function. Just like with other iterables, we can sort dictionaries based on different criteria depending on the key argument of the sorted() function.
WARNING: This example requires Python 2.7 or higher.
Python's built-in Counter
object is exactly what you're looking for. Counting words is even the first example in the documentation:
>>> # Tally occurrences of words in a list >>> from collections import Counter >>> cnt = Counter() >>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']: ... cnt[word] += 1 >>> cnt Counter({'blue': 3, 'red': 2, 'green': 1})
As specified in the comments, Counter
takes an iterable, so the above example is merely for illustration and is equivalent to:
>>> mywords = ['red', 'blue', 'red', 'green', 'blue', 'blue'] >>> cnt = Counter(mywords) >>> cnt Counter({'blue': 3, 'red': 2, 'green': 1})
You can use the same dictionary:
>>> d = { "foo": 4, "bar": 2, "quux": 3 } >>> sorted(d.items(), key=lambda item: item[1])
The second line prints:
[('bar', 2), ('quux', 3), ('foo', 4)]
If you only want a sorted word list, do:
>>> [pair[0] for pair in sorted(d.items(), key=lambda item: item[1])]
That line prints:
['bar', 'quux', 'foo']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With