Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given a file, find the ten most frequently occurring words as efficiently as possible

Tags:

This is apparently an interview question (found it in a collection of interview questions), but even if it's not it's pretty cool.

We are told to do this efficiently on all complexity measures. I thought of creating a HashMap that maps the words to their frequency. That would be O(n) in time and space complexity, but since there may be lots of words we cannot assume that we can store everything in memory.

I must add that nothing in the question says that the words cannot be stored in memory, but what if that were the case? If that's not the case, then the question does not seem as challenging.

like image 881
efficiencyIsBliss Avatar asked Dec 21 '10 00:12

efficiencyIsBliss


People also ask

How do I find the most frequent words in a text file?

This can be done by opening a file in read mode using file pointer. Read the file line by line. Split a line at a time and store in an array. Iterate through the array and find the frequency of each word and compare the frequency with maxcount.

How do you find the maximum occurring word in a string in Java?

A simple solution is to run two loops and count occurrences of every word. Time complexity of this solution is O(n * n * MAX_WORD_LEN).


1 Answers

Optimizing for my own time:

sort file | uniq -c | sort -nr | head -10 

Possibly followed by awk '{print $2}' to eliminate the counts.

like image 104
Ben Jackson Avatar answered Jan 04 '23 02:01

Ben Jackson