Design an algorithm, find the most frequently used word in a book

Question

An interview question:

Find the most frequently used word in a book.

My idea:

Use a hash table, traverse and mark the hash table.

If the book's size is known, if any word is found to be used > 50%, then skip any new words in the following traversal and only count old words. What if the book size is unknown?

It is O(n) and O(n) time and space.

Any better ideas?

Thanks

SmacL · Accepted Answer

To determine complexity I think you need to consider two variables, n = total number of words, m = number of unique words. I imagine the best case complexity will come out close to O(n log(m)) for speed, and O(m) for storage, assuming each time you iterate over each of n words, and build and search based on a hash table or other such structure which eventually contains m elements.

thekoalaz · Answer

This is actually a classic example of map reduce.

The example in the wikipedia page will give you the word count of each unique word, but you can easily add a step in the reduce step that keeps track of the current most common word(with some kind of mutex to deal with concurrency issues).

If you have a distributed cluster of machines or a highly parallelized computer this will run much faster than using the hash table.

Design an algorithm, find the most frequently used word in a book

Tags:

python

algorithm

data-structures

hash

user1002288

2 Answers

SmacL

thekoalaz

Recent Activity

Donate For Us

Design an algorithm, find the most frequently used word in a book

Tags:

python

algorithm

data-structures

hash

user1002288

2 Answers

SmacL

thekoalaz

Related questions

Recent Activity

Donate For Us