Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the most popular words in a list

I have a list of words:

words = ['all', 'awesome', 'all', 'yeah', 'bye', 'all', 'yeah']

And I want to get a list of tuples:

[(3, 'all'), (2, 'yeah'), (1, 'bye'), (1, 'awesome')]

where each tuple is...

(number_of_occurrences, word)

The list should be sorted by the number of occurrences.

What I've done so far:

def popularWords(words):
    dic = {}
    for word in words:
        dic.setdefault(word, 0)
        dic[word] += 1
    wordsList = [(dic.get(w), w) for w in dic]
    wordsList.sort(reverse = True)
    return wordsList

The question is...

Is it Pythonic, elegant and efficient? Are you able to do it better? Thanks in advance.

like image 540
Maciej Ziarko Avatar asked Mar 08 '11 23:03

Maciej Ziarko


People also ask

How do I find the most common words?

WordCounter analyzes your text and tells you the most common words and phrases. This tool helps you count words, bigrams, and trigrams in plain text. This is often the first step in quantitative text analysis.

How do I find the most frequent words in a file?

This can be done by opening a file in read mode using file pointer. Read the file line by line. Split a line at a time and store in an array. Iterate through the array and find the frequency of each word and compare the frequency with maxcount.

How do you find the most common string in a list?

Use the max() Function of FreqDist() to Find the Most Common Elements of a List in Python. You can also use the max() command of FreqDist() to find the most common list elements in Python.


2 Answers

You can use the counter for this.

import collections
words = ['all', 'awesome', 'all', 'yeah', 'bye', 'all', 'yeah']
counter = collections.Counter(words)
print(counter.most_common())
>>> [('all', 3), ('yeah', 2), ('bye', 1), ('awesome', 1)]

It gives the tuple with reversed columns.

From the comments: collections.counter is >=2.7,3.1. You can use the counter recipe for lower versions.

like image 84
SiggyF Avatar answered Oct 20 '22 19:10

SiggyF


The defaultdict collection is what you are looking for:

from collections import defaultdict

D = defaultdict(int)
for word in words:
    D[word] += 1

That gives you a dict where keys are words and values are frequencies. To get to your (frequency, word) tuples:

tuples = [(freq, word) for word,freq in D.iteritems()]

If using Python 2.7+/3.1+, you can do the first step with a builtin Counter class:

from collections import Counter
D = Counter(words)
like image 20
Kenan Banks Avatar answered Oct 20 '22 18:10

Kenan Banks