Given a file, find the ten most frequently occurring words as efficiently as possible

Tags:

This is apparently an interview question (found it in a collection of interview questions), but even if it's not it's pretty cool.

We are told to do this efficiently on all complexity measures. I thought of creating a HashMap that maps the words to their frequency. That would be O(n) in time and space complexity, but since there may be lots of words we cannot assume that we can store everything in memory.

I must add that nothing in the question says that the words cannot be stored in memory, but what if that were the case? If that's not the case, then the question does not seem as challenging.

881

asked Dec 21 '10 00:12

efficiencyIsBliss

1 Answers

Optimizing for my own time:

sort file | uniq -c | sort -nr | head -10

Possibly followed by awk '{print $2}' to eliminate the counts.

104

answered Jan 04 '23 02:01

Ben Jackson

Related questions
                            
                                Showing all broadcast events on Android
                            
                                How to read a xml string into XMLTextReader type
                            
                                Template Binding in Control template
                            
                                Find elements in a Node without the proper namespace, in Java
                            
                                UI best practice question: Cancel button or Cancel link
                            
                                Function pointer and calling convention
                            
                                Most efficient way to get several hashes in Redis?
                            
                                Xcode built-in snippets edit
                            
                                Coffee script compilation
                            
                                Why can't I use the enumerator of an array, instead of implementing it myself?
                            
                                Some clarification needed about synchronous versus asynchronous asio operations
                            
                                Jquery Selecting Table Rows with Checkbox

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With