Recently in a technical interview, I was asked to write a program to find the high frequency words(Words which appear maximum number of times) in a text book. The program should be designed in such a way that, it processes the entire text book with minimum memory. Performance is not a concern. I was able to program to find the frequency of words, but it consumed a lot of memory.
How do you make this operation less memory intensive? Any strategies/solutions?
-Snehal
You probably used hash tables which are memory-intensive but have a constant-lookup time--so the performance/memory trade off is obvious. By the time you reach the end of the book you will know your answer. Also, incrementing counters for each word is fast (because of the quick hashtable lookups).
The other end of the spectrum is to look at the first word, then go through the entire book to see how many times that word occurs. This requires minimal memory. Then you do the same for the next word and go through the entire book. If that word occurs more times, you add that as the top word (or top N words). Of course, this is extremely inefficient--if the first and third word are the same you'll end up going through the whole book again even though you just did the same thing for the first word.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With