Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stack Overflow Related questions algorithm [closed]

The related questions that appear after entering the title, and those that are in the right side bar when viewing a question seem to suggest very apt questions.

Stack Overflow only does a SQL search for it and uses no special algorithms, said Spolsky in a talk.

What algorithms exist to give good answers in such a case. How do U do database search in such a case? Make the title searchable and search on the keywords or search on tags and those questions with many votes on top?

like image 446
lprsd Avatar asked May 21 '09 07:05

lprsd


People also ask

Why was my question closed on Stack Overflow?

A question can be 'closed' for five reasons - duplicate, off-topic, subjective, not a real question and too localized. In this work, we present the first study of 'closed' questions in Stack Overflow.

How do you answer a closed question on Stack Overflow?

Closed questions cannot and should not be answered at all. They have to be reopened to get an answer, and they can only be reopened if the Question content itself is an appropriate question. Comments don't count. So edit the Question, and get it to a point where it should be opened, and then vote to reopen.

Who can close question on Stack Overflow?

Closed questions don't allow any new answers to be added, but can still be edited and commented on. All it takes is one user (with the appropriate reputation level, natch) to decide… As an active Stack Overflow user, one of the abilities you'll gain at 3,000 reputation is the ability to close and reopen questions.

Can you delete Stack Overflow questions?

Post authors can delete their answers. But they can only delete their questions when there are no significantly upvoted answers to the question. Users with 10,000+ reputation can delete questions that have been closed for 48 hours, if they cast three (3) votes for deletion.


4 Answers

If you listen to the Stack Overflow podcast 32 (unfortunately the transcript doesn't have much in) you can hear Jeff Atwood say a little about how he does it.

It seems like the algorithm is something like:

  • Take the question
  • Remove the most common words in English (from a list he got from google)
  • submit a full text search to the SQL server 2008 full text search engine

More details about the full text search can be found here: http://msdn.microsoft.com/en-us/library/ms142571.aspx

This may be out of date by now - they were talking about moving to a better/faster full text search such as Lucene, and I vaguely remember Jeff saying in the podcast that this had been done.

like image 78
Nick Fortescue Avatar answered Oct 05 '22 22:10

Nick Fortescue


The related questions sidebar will be building on the tags for each question (probably by ranking them based on tag overlap, so 5 tags in common > 4 tags in common etc).

The rest will be building on heuristics and algorithms suitable for natural language processing. These aren't normally very good in general purpose language, but most of them are VERY good once the vocabulary is reduced down to a single technical area such as programming.

like image 20
workmad3 Avatar answered Oct 05 '22 22:10

workmad3


Have a look at Porter stemming for a stemming algorithm if you are looking to get into "related" algorithms.

A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stemmer", "stemming", "stemmed" as based on "stem". A stemming algorithm reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish".

Once you have processed a document and done stemming on it, you can index the stemmed words by count and then compare against other documents. This is the most basic approach to tackling this problem.

Also take care to ignore stop words like "the", "an", "of" etc.

like image 40
aleemb Avatar answered Oct 05 '22 21:10

aleemb


This post will help you Is there an algorithm that tells the semantic similarity of two phrases

like image 28
victor hugo Avatar answered Oct 05 '22 22:10

victor hugo