Is there a supervised learning algorithm that takes tags as input, and produces a probability as output?

Tags:

Let's say I want to determine the probability that I will upvote a question on SO, based only on which tags are present or absent.

Let's also imagine that I have plenty of data about past questions that I did or did not upvote.

Is there a machine learning algorithm that could take this historical data, train on it, and then be able to predict my upvote probability for future questions? Note that it must be the probability, not just some arbitrary score.

Let's assume that there will be up-to 7 tags associated with any given question, these being drawn from a superset of tens of thousands.

My hope is that it is able to make quite sophisticated connections between tags, rather than each tag simply contributing to the end result in a "linear" way (much as words do in a Bayesian spam filter).

So for example, it might be that the word "java" increases my upvote probability, except when it is present with "database", however "database" might increase my upvote probability when present with "ruby".

Oh, and it should be computationally reasonable (training within an hour or two on millions of questions).

What approaches should I research here?

700

asked Apr 28 '11 04:04

sanity

1 Answers

Given that there probably aren't many tags per message, you could just create "n-gram" tags and apply naive Bayes. Regression trees would also produce an empirical probability at the leaf nodes, using +1 for upvote and 0 for no upvote. See http://www.stat.cmu.edu/~cshalizi/350-2006/lecture-10.pdf for some readable lecture notes and http://sites.google.com/site/rtranking/ for an open source implementation.

189

answered Nov 15 '22 07:11

RecursivelyIronic

Related questions
                            
                                Stochastic Search to lambda expression
                            
                                RMSE loss for multi output regression problem in PyTorch
                            
                                SARSA algorithm
                            
                                Where can I get very simple inroduction to all Artificial intelligent techniques with Real world Examples
                            
                                Is there a well-designed, maintained decision tree learning library for Java?
                            
                                Supervised learning with multiple sources of training data
                            
                                Algorithmic music composition [closed]
                            
                                Code generation with Machine learning [closed]
                            
                                What's the win strategy in such a game?
                            
                                Can "Monte-Carlo Tree Search" be applied on a "two player game with imperfect information" like Stratego?
                            
                                Optimizing a genetic algorithm?
                            
                                Microsoft BotFramework for private apps
                            
                                AI / Statistical methods for determining the name of a colour
                            
                                Searching Natural Language Sentence Structure
                            
                                AI: Fastest algorithm to find if path exists?
                            
                                Using Artificial Intelligence (AI) to predict Stock Prices
                            
                                How to cluster similar sentences using BERT
                            
                                Algorithms for realtime strategy wargame AI
                            
                                A* heuristic, overestimation/underestimation?
                            
                                Monte Carlo Tree Search agent in a game of Isolation - Debug Suggestions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a supervised learning algorithm that takes tags as input, and produces a probability as output?

Tags:

artificial-intelligence

machine-learning

data-mining

sanity

People also ask

1 Answers

RecursivelyIronic

Recent Activity

Donate For Us