An interview question:
Find the most frequently used word in a book.
My idea:
Use a hash table, traverse and mark the hash table.
If the book's size is known, if any word is found to be used > 50%, then skip any new words in the following traversal and only count old words. What if the book size is unknown?
It is O(n) and O(n) time and space.
Any better ideas?
Thanks
To determine complexity I think you need to consider two variables, n = total number of words, m = number of unique words. I imagine the best case complexity will come out close to O(n log(m)) for speed, and O(m) for storage, assuming each time you iterate over each of n words, and build and search based on a hash table or other such structure which eventually contains m elements.
This is actually a classic example of map reduce.
The example in the wikipedia page will give you the word count of each unique word, but you can easily add a step in the reduce step that keeps track of the current most common word(with some kind of mutex to deal with concurrency issues).
If you have a distributed cluster of machines or a highly parallelized computer this will run much faster than using the hash table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With