The related questions that appear after entering the title, and those that are in the right side bar when viewing a question seem to suggest very apt questions. Stack Overflow only does a SQL search for it and uses no special algorithms, said Spolsky in a talk. What algorithms exist to give good answers in such a case. How do U do database search in such a case? Make the title searchable and search on the keywords or search on tags and those questions with many votes on top?

If you listen to the Stack Overflow podcast 32 (unfortunately the transcript doesn't have much in) you can hear Jeff Atwood say a little about how he does it. It seems like the algorithm is something like: <ul> <li>Take the question</li> <li>Remove the most common words in English (from a list he got from google)</li> <li>submit a full text search to the SQL server 2008 full text search engine</li> </ul> More details about the full text search can be found here: http://msdn.microsoft.com/en-us/library/ms142571.aspx This may be out of date by now - they were talking about moving to a better/faster full text search such as Lucene, and I vaguely remember Jeff saying in the podcast that this had been done.

Stack Overflow Related questions algorithm [closed]

Tags:

sql

search

full-text-search

nlp

The related questions that appear after entering the title, and those that are in the right side bar when viewing a question seem to suggest very apt questions.

Stack Overflow only does a SQL search for it and uses no special algorithms, said Spolsky in a talk.

What algorithms exist to give good answers in such a case. How do U do database search in such a case? Make the title searchable and search on the keywords or search on tags and those questions with many votes on top?

446

asked May 21 '09 07:05

lprsd

4 Answers

If you listen to the Stack Overflow podcast 32 (unfortunately the transcript doesn't have much in) you can hear Jeff Atwood say a little about how he does it.

It seems like the algorithm is something like:

Take the question
Remove the most common words in English (from a list he got from google)
submit a full text search to the SQL server 2008 full text search engine

More details about the full text search can be found here: http://msdn.microsoft.com/en-us/library/ms142571.aspx

This may be out of date by now - they were talking about moving to a better/faster full text search such as Lucene, and I vaguely remember Jeff saying in the podcast that this had been done.

answered Oct 05 '22 22:10

Nick Fortescue

The related questions sidebar will be building on the tags for each question (probably by ranking them based on tag overlap, so 5 tags in common > 4 tags in common etc).

The rest will be building on heuristics and algorithms suitable for natural language processing. These aren't normally very good in general purpose language, but most of them are VERY good once the vocabulary is reduced down to a single technical area such as programming.

answered Oct 05 '22 22:10

workmad3

Have a look at Porter stemming for a stemming algorithm if you are looking to get into "related" algorithms.

A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stemmer", "stemming", "stemmed" as based on "stem". A stemming algorithm reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish".

Once you have processed a document and done stemming on it, you can index the stemmed words by count and then compare against other documents. This is the most basic approach to tackling this problem.

Also take care to ignore stop words like "the", "an", "of" etc.

answered Oct 05 '22 21:10

aleemb

This post will help you Is there an algorithm that tells the semantic similarity of two phrases

answered Oct 05 '22 22:10

victor hugo

Related questions
                            
                                When to use R, when to use SQL?
                            
                                Ways to implement tags - pros and cons of each
                            
                                Why is running a query on SQL Azure so much slower?
                            
                                How to store only time; not date and time?
                            
                                Updating and join on multiple rows, which row's value is used?
                            
                                C++ SQL database library comparison [closed]
                            
                                Which provides better performance one big join or multiple queries?
                            
                                How can I Schedule a Sql job in Microsoft Azure SQL database?
                            
                                case statement in SQL, how to return multiple variables?
                            
                                T-SQL: SUSER_SNAME vs SUSER_NAME?
                            
                                Does SQL Server CACHES Query Results? [duplicate]
                            
                                PostgreSQL: Create table if not exists AS
                            
                                How to store ordered items which often change position in DB
                            
                                How to delete in MS Access when using JOIN's?
                            
                                Composite key as foreign key (sql)
                            
                                What does the `width` field mean in PostgreSQL's EXPLAIN?
                            
                                How do you ADD and DROP columns in a single ALTER TABLE
                            
                                INSERT-OUTPUT including column from other table
                            
                                Find all Database Objects by Name?
                            
                                MySQL, Error 126: Incorrect key file for table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Stack Overflow Related questions algorithm [closed]

Tags:

sql

search

full-text-search

nlp

lprsd

People also ask

4 Answers

Nick Fortescue

workmad3

aleemb

victor hugo

Recent Activity

Donate For Us