Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithms/theory behind predictive autocomplete?

Simple word autocomplete just displays a list of words that match the characters that were already typed. But I would like to order the words in the autocomplete list according to the probability of the words occuring, depending on the words that were typed before, relying on a statistical model of a text corpus. What algorithms and data structures do I need for this? Can you give me links for good tutorials?

like image 960
chiborg Avatar asked Jul 12 '12 09:07

chiborg


People also ask

What is a Autocomplete algorithm?

Autocomplete is a feature that suggests a complete word or phrase after a user has typed just a few letters. The feature increases text input speed especially on mobile devices, because one doesn't have to type every letter in a word. Autocomplete functionality is commonly found on search engines and messaging apps.

Which data structure is used for Autocomplete?

Autocomplete is a feature of suggesting possible extensions to a partially written text and is widely used in search engine, code IDEs and much more. Trie data structure is a perfect fit to implement this feature efficient in terms of memory and time [O(length of string)].

Does Autocomplete use machine learning?

Intelligent Autocomplete uses Machine Learning to predict the most relevant suggestions and is specific to an engine.

How does search Autocomplete work?

Autocomplete is a feature within Google Search that makes it faster to complete searches that you start to type. Our automated systems generate predictions that help people save time by allowing them to quickly complete the search they already intended to do.


Video Answer


1 Answers

You don't need probability for autocompletion. Instead, build a prefix tree (aka a trie) with the words in the corpus as keys and their frequencies as values. When you encounter a partial string, walk the trie as far as you can, then generate all the suffixes from the point you've reached and sort them by frequency.

When a user enters a previously unseen string, just add it to the trie with frequency one; when a user enters a string that you had seen (perhaps by selecting it from the candidate list), increment its frequency.

[Note that you can't do the simple increment with a probability model; in the worst case, you'd have to recompute all the probabilities in the model.]

If you want to delve deeper into this kind of algorithms, I highly suggest you read (the first chapters of) Speech and Language Processing by Jurafsky and Martin. It treats discrete probability for language processing in quite some detail.

like image 123
Fred Foo Avatar answered Sep 17 '22 05:09

Fred Foo