What is the theory behind the algorithms, that for example, generate the suggestions on stackoverflow site for similar questions while you write one? Could you recommend some books on the subject?
The algorithms you talk about are found mainly in 3 AI branches: NLP, ML and IR.
For example to find the most similar 10 questions of a new question one could extract n-grams from the texts of each question, compute TF-IDF weight vectors for each question's n-grams, then compute the cosine similarity between the new question and all the other questions, and choose the 10 questions with the highest similarities.
Some free books you can read:
http://nlp.stanford.edu/IR-book/
http://infolab.stanford.edu/~ullman/mmds.html
And a 2 free courses starting late January:
http://www.nlp-class.org/
http://jan2012.ml-class.org/
Also (kind of involved):
http://see.stanford.edu/see/courseinfo.aspx?coll=63480b48-8819-4efd-8412-263f1a472f5a
http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With