Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to balance number of ratings versus the ratings themselves?

For a school project, we'll have to implement a ranking system. However, we figured that a dumb rank average would suck: something that one user ranked 5 stars would have a better average that something 188 users ranked 4 stars, and that's just stupid.

So I'm wondering if any of you have an example algorithm of "smart" ranking. It only needs to take in account the rankings given and the number of rankings.

Thanks!

like image 202
zneak Avatar asked Mar 22 '10 20:03

zneak


People also ask

How is rating calculated?

The process of calculating an average numeric rating is to get the total of all section ratings. Then, this total is divided by the number of sections in the performance document. So, if there were four sections in the document, the calculator would divide the total number of numeric ratings by four.

How to calculate a Bayesian average?

True Bayesian estimate: weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C where: R = average for the mean = Rating. v = number of votes = votes.

What are ratings and reviews?

Ratings and reviews allow customers to share their experience with a product or service, and give it an overall star rating. Shoppers rely on this content to make more informed purchase decisions.


1 Answers

You can use a method inspired by Bayesian probability. The gist of the approach is to have an initial belief about the true rating of an item, and use users' ratings to update your belief.

This approach requires two parameters:

  1. What do you think is the true "default" rating of an item, if you have no ratings at all for the item? Call this number R, the "initial belief".
  2. How much weight do you give to the initial belief, compared to the user ratings? Call this W, where the initial belief is "worth" W user ratings of that value.

With the parameters R and W, computing the new rating is simple: assume you have W ratings of value R along with any user ratings, and compute the average. For example, if R = 2 and W = 3, we compute the final score for various scenarios below:

  • 100 (user) ratings of 4: (3*2 + 100*4) / (3 + 100) = 3.94
  • 3 ratings of 5 and 1 rating of 4: (3*2 + 3*5 + 1*4) / (3 + 3 + 1) = 3.57
  • 10 ratings of 4: (3*2 + 10*4) / (3 + 10) = 3.54
  • 1 rating of 5: (3*2 + 1*5) / (3 + 1) = 2.75
  • No user ratings: (3*2 + 0) / (3 + 0) = 2
  • 1 rating of 1: (3*2 + 1*1) / (3 + 1) = 1.75

This computation takes into consideration the number of user ratings, and the values of those ratings. As a result, the final score roughly corresponds to how happy one can expect to be about a particular item, given the data.

Choosing R

When you choose R, think about what value you would be comfortable assuming for an item with no ratings. Is the typical no-rating item actually 2.4 out of 5, if you were to instantly have everyone rate it? If so, R = 2.4 would be a reasonable choice.

You should not use the minimum value on the rating scale for this parameter, since an item rated extremely poorly by users should end up "worse" than a default item with no ratings.

If you want to pick R using data rather than just intuition, you can use the following method:

  • Consider all items with at least some threshold of user ratings (so you can be confident that the average user rating is reasonably accurate).
  • For each item, assume its "true score" is the average user rating.
  • Choose R to be the median of those scores.

If you want to be slightly more optimistic or pessimistic about a no-rating item, you can choose R to be a different percentile of the scores, for instance the 60th percentile (optimistic) or 40th percentile (pessimistic).

Choosing W

The choice of W should depend on how many ratings a typical item has, and how consistent ratings are. W can be higher if items naturally obtain many ratings, and W should be higher if you have less confidence in user ratings (e.g., if you have high spammer activity). Note that W does not have to be an integer, and can be less than 1.

Choosing W is a more subjective matter than choosing R. However, here are some guidelines:

  • If a typical item obtains C ratings, then W should not exceed C, or else the final score will be more dependent on R than on the actual user ratings. Instead, W should be close to a fraction of C, perhaps between C/20 and C/5 (depending on how noisy or "spammy" ratings are).
  • If historical ratings are usually consistent (for an individual item), then W should be relatively small. On the other hand, if ratings for an item vary wildly, then W should be relatively large. You can think of this algorithm as "absorbing" W ratings that are abnormally high or low, turning those ratings into more moderate ones.
  • In the extreme, setting W = 0 is equivalent to using only the average of user ratings. Setting W = infinity is equivalent to proclaiming that every item has a true rating of R, regardless of the user ratings. Clearly, neither of these extremes are appropriate.
  • Setting W too large can have the effect of favoring an item with many moderately-high ratings over an item with slightly fewer exceptionally-high ratings.
like image 148
k_ssb Avatar answered Oct 15 '22 18:10

k_ssb