Algorithms/theory behind predictive autocomplete?

Tags:

Simple word autocomplete just displays a list of words that match the characters that were already typed. But I would like to order the words in the autocomplete list according to the probability of the words occuring, depending on the words that were typed before, relying on a statistical model of a text corpus. What algorithms and data structures do I need for this? Can you give me links for good tutorials?

960

asked Jul 12 '12 09:07

chiborg

Video Answer

1 Answers

You don't need probability for autocompletion. Instead, build a prefix tree (aka a trie) with the words in the corpus as keys and their frequencies as values. When you encounter a partial string, walk the trie as far as you can, then generate all the suffixes from the point you've reached and sort them by frequency.

When a user enters a previously unseen string, just add it to the trie with frequency one; when a user enters a string that you had seen (perhaps by selecting it from the candidate list), increment its frequency.

[Note that you can't do the simple increment with a probability model; in the worst case, you'd have to recompute all the probabilities in the model.]

If you want to delve deeper into this kind of algorithms, I highly suggest you read (the first chapters of) Speech and Language Processing by Jurafsky and Martin. It treats discrete probability for language processing in quite some detail.

123

answered Sep 17 '22 05:09

Fred Foo

Related questions
                            
                                Graph Theory: Calculating Clustering Coefficient
                            
                                Data for simple TSP [closed]
                            
                                Gray code increment function
                            
                                Efficient algorithm for comparing XML nodes
                            
                                Does dijkstras algorithm relax the edges of the shortest path in order?
                            
                                C# Diff Algorithm for Text [closed]
                            
                                Given a string of red and blue balls, find min number of swaps to club the colors together
                            
                                How to sort an integer array into negative, zero, positive part without changing relative position?
                            
                                Breaking a string apart into a sequence of words
                            
                                find element in the middle of a stack
                            
                                Implementation of Chazelle's triangulation algorithm
                            
                                HTML Table rendering algorithms, recommended reading?
                            
                                Lotto Ticket Coverage From Algorithm Design Manual?
                            
                                Why is factoring in NP, but not in P?
                            
                                Design a system to keep top k frequent words real time
                            
                                Space complexity of Adjacency List representation of Graph
                            
                                Find a duplicate in array of integers
                            
                                How is make_heap in C++ implemented to have complexity of 3N?
                            
                                DP algorithm for bounded Knapsack?
                            
                                Performance of algorithm suddenly increases by a factor of ~10

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Algorithms/theory behind predictive autocomplete?

Tags:

text

algorithm

autocomplete

probability

nlp

chiborg

People also ask

Video Answer

1 Answers

Fred Foo

Recent Activity

Donate For Us