Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ranking algorithms

I have around 4000 blog posts with me. I wanna rank all posts according to the following values

Upvote Count => P
Comments Recieved => C
Share Count => S
Created time in Epoch => E
Follower Count of Category which post belongs to => F (one post has one category)
User Weight => U (User with most number of post have biggest weight)

I am expecting answer in pseudo code.

like image 232
shajin Avatar asked Jun 11 '13 04:06

shajin


People also ask

What is Google's ranking algorithm?

PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results.

What is a ranked model?

Ranking models typically work by predicting a relevance score s = f(x) for each input x = (q, d) where q is a query and d is a document. Once we have the relevance of each document, we can sort (i.e. rank) the documents according to those scores. Ranking models rely on a scoring function. (


1 Answers

Your problem falls into the category of regression (link). In machine learning terms, you have a collection of features (link) (which you list in your question) and you have a score value that you want to predict given those features.

What Ted Hopp has suggested is basically a linear predictor function (link). That might be too simple a model for your scenario.

Consider using logistic regression (link) for your problem. Here's how you would go about using it.

1. create your model-learning dataset

Randomly select some m blog posts from your set of 4000. It should be a small enough set that you can comfortably look through these m blog posts by hand.

For each of the m blog posts, score how "good" it is with a number from 0 to 1. If it helps, you can think of this as using 0, 1, 2, 3, 4 "stars" for the values 0, 0.25, 0.5, 0.75, 1.

You now have m blog posts that each have a set of features and a score.

You can optionally expand your feature set to include derived features - for example, you could include the logarithm of the "Upvote Count," the "Comments Recieved", the "Share Count," and the "Follower Count," and you could include the logarithm of the number of hours between "now" and "Created Time."

2. learn your model

Use gradient descent to find a logistic regression model that fits your model-learning dataset. You should partition your dataset into training, validation, and test sets so that you can carry out those respective steps in the model-learning process.

I won't elaborate any more on this section because the internet is full of the details and it's a canned process.

Wikipedia links:

  • Gradient descent
  • Logistic regression model fitting

3. apply your model

Having learned your logistic regression model, you can now apply it to predict the score for how "good" a new blog post is! Simply compute the set of features (and derived features), then use your model to map those features to a score.

Again, the internet is full of the details for this section, which is a canned process.


If you have any questions, make sure to ask!

If you're interested in learning more about machine learning, you should consider taking the free online Stanford Machine Learning course on Coursera.org. (I'm not affiliated with Stanford or Coursera.)

like image 200
Timothy Shields Avatar answered Oct 07 '22 21:10

Timothy Shields