I'm asking about the new labs feature "Google Scribe." Here is the link: http://scribe.googlelabs.com/
I am interested in the backend and the frontend, but mainly the backend. I want to build something similar with a very specific data set (derived from my own documents). I think the frontend of it is fairly straightforward, and I could probably even use an existing autocomplete plugin for the task.
Possible implementation suggestion:
Backend: Build and maintain a NxNxW sparse matrix A (e.g. implemented as a Hash) where N is the size of your vocabulary and W is the maximum context (in words) you wish to maintain (e.g. W=4 might be OK.) Go over some sample data to seed/initialize A such that A[n1,n2,w] counts the number of times word n2 appeared at the wth position after word n1 (respect sentence boundaries.)
Frontend: As the user types, ask the backend to use A to evaluate (and rank) the most probable successor words based on the last W words completely typed out by the user in the current sentence; only show those suggestions which start with what the user is in the progress of typing (i.e. the user's "current" (partial) word.)
Optionally have the backend update M based on the words that the user has completed typing, either on the fly (challenging when the user goes back to perform corrections), or as the finalized text is submitted (easiest), or via some periodic job evaluating texts submitted since the last job run.
You need to use Markov Chains.
You may want to start by looking here. The sample output is interesting too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With