Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reconstructing now-famous 17-year-old's Markov-chain-based information-retrieval algorithm "Apodora"

While we were all twiddling our thumbs, a 17-year-old Canadian boy has apparently found an information retrieval algorithm that:

a) performs with twice the precision of the current, and widely-used vector space model

b) is 'fairly accurate' at identifying similar words.

c) makes microsearch more accurate

Here is a good interview.

Unfortunately, there's no published paper I can find yet, but, from the snatches I remember from the graphical models and machine learning classes I took a few years ago, I think we should be able to reconstruct it from his submision abstract, and what he says about it in interviews.

From interview:

Some searches find words that appear in similar contexts. That’s pretty good, but that’s following the relationships to the first degree. My algorithm tries to follow connections further. Connections that are close are deemed more valuable. In theory, it follows connections to an infinite degree.

And the abstract puts it in context:

A novel information retrieval algorithm called "Apodora" is introduced, using limiting powers of Markov chain-like matrices to determine models for the documents and making contextual statistical inferences about the semantics of words. The system is implemented and compared to the vector space model. Especially when the query is short, the novel algorithm gives results with approximately twice the precision and has interesting applications to microsearch.

I feel like someone who knows about markov-chain-like matrices or information retrieval would immediately be able to realize what he's doing.

So: what is he doing?

like image 886
silverasm Avatar asked Aug 06 '11 15:08

silverasm


1 Answers

From the use of words like 'context' and the fact that he's introduced a second order level of statistical dependency, I suspect he is doing something related to the LDA-HMM method outlined in the paper: Griffiths, T., Steyvers, M., Blei, D., & Tenenbaum, J. (2005). Integrating topics and syntax. Advances in Neural Information Processing Systems. There are some inherent limits to the resolution of the search due to model averaging. However, I'm envious of doing stuff like this at 17 and I hope to heck he's done something independent and at least incrementally better. Even a different direction on the same topic would be pretty cool.

like image 121
Aengus Avatar answered Sep 28 '22 19:09

Aengus