I need to create a help desk for customers in a website I'm building and I love the way StackOverflow finds similar questions. Does anyone know what algorithm the site uses and can you provide any references where I can find one?
Marking a question as duplicate is part of the question-closing procedure, except that when a question is closed as duplicate, the title is appended with "[duplicate]" rather than "[closed]". Moderators and anyone with 3000 reputation may vote to close a question as a duplicate.
Abstract: Duplicate questions on Stack Overflow are questions that are flagged as being conceptually equivalent to a previously posted question.
What's missing? Some observations: The top Stack Overflow question of all time — with more than 7 million views since its creation 9 years ago — is not even a programming question: “How do I undo the most recent commits in Git”.
An algorithm is a sequence of well-defined steps that defines an abstract solution to a problem. Use this tag when your issue is related to algorithm design.
There is a whole branch of Machine Learning
called clustering
(a type of unsupervised learning
) that deals with such types of problems.
The question becomes a part of a cluster, and other questions in the same cluster (probably in the order of similarity measure
distance) are displayed as similar questions.
There are various features
that it can use for clustering, some of which may be:
and so on.
There may be other formulated features using techniques like text summarization
, sentiment analysis
, etc., that are used in these kind of problems. Which features are good for which problem depends on the problem.
Other areas where you see these algorithms in action are:
and the list continues to infinity.
So what can you do about your problem?
There is no one answer for it. It all depends on your data, and target query. But still, you can
feature engineering
aspects of machine learning
.clustering
.(There are many online courses for these.)
Or
Most likley a weighted match on tags and perhaps a match() or equivilent full text weighted search on title.
Its probably got details of it in meta somewhere or FAQ
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With