What is a better way to sort by a 5 star rating?

Tags:

I'm trying to sort a bunch of products by customer ratings using a 5 star system. The site I'm setting this up for does not have a lot of ratings and continue to add new products so it will usually have a few products with a low number of ratings.

I tried using average star rating but that algorithm fails when there is a small number of ratings.

Example a product that has 3x 5 star ratings would show up better than a product that has 100x 5 star ratings and 2x 2 star ratings.

Shouldn't the second product show up higher because it is statistically more trustworthy because of the larger number of ratings?

786

asked Sep 11 '09 14:09

Vizjerai

2 Answers

Prior to 2015, the Internet Movie Database (IMDb) publicly listed the formula used to rank their Top 250 movies list. To quote:

The formula for calculating the Top Rated 250 Titles gives a true Bayesian estimate:
weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C 
where:

R = average for the movie (mean)

v = number of votes for the movie

m = minimum votes required to be listed in the Top 250 (currently 25000)

C = the mean vote across the whole report (currently 7.0)

For the Top 250, only votes from regular voters are considered.

It's not so hard to understand. The formula is:

rating = (v / (v + m)) * R +          (m / (v + m)) * C;

Which can be mathematically simplified to:

rating = (R * v + C * m) / (v + m);

The variables are:

R – The item's own rating. R is the average of the item's votes. (For example, if an item has no votes, its R is 0. If someone gives it 5 stars, R becomes 5. If someone else gives it 1 star, R becomes 3, the average of [1, 5]. And so on.)
C – The average item's rating. Find the R of every single item in the database, including the current one, and take the average of them; that is C. (Suppose there are 4 items in the database, and their ratings are [2, 3, 5, 5]. C is 3.75, the average of those numbers.)
v – The number of votes for an item. (To given another example, if 5 people have cast votes on an item, v is 5.)
m – The tuneable parameter. The amount of "smoothing" applied to the rating is based on the number of votes (v) in relation to m. Adjust m until the results satisfy you. And don't misinterpret IMDb's description of m as "minimum votes required to be listed" – this system is perfectly capable of ranking items with less votes than m.

All the formula does is: add m imaginary votes, each with a value of C, before calculating the average. In the beginning, when there isn't enough data (i.e. the number of votes is dramatically less than m), this causes the blanks to be filled in with average data. However, as votes accumulates, eventually the imaginary votes will be drowned out by real ones.

In this system, votes don't cause the rating to fluctuate wildly. Instead, they merely perturb it a bit in some direction.

When there are zero votes, only imaginary votes exist, and all of them are C. Thus, each item begins with a rating of C.

Martin Harris

Evan Miller shows a Bayesian approach to ranking 5-star ratings: enter image description here

where

nk is the number of k-star ratings,
sk is the "worth" (in points) of k stars,
N is the total number of votes
K is the maximum number of stars (e.g. K=5, in a 5-star rating system)
z_alpha/2 is the 1 - alpha/2 quantile of a normal distribution. If you want 95% confidence (based on the Bayesian posterior distribution) that the actual sort criterion is at least as big as the computed sort criterion, choose z_alpha/2 = 1.65.

In Python, the sorting criterion can be calculated with

def starsort(ns):     """     http://www.evanmiller.org/ranking-items-with-star-ratings.html     """     N = sum(ns)     K = len(ns)     s = list(range(K,0,-1))     s2 = [sk**2 for sk in s]     z = 1.65     def f(s, ns):         N = sum(ns)         K = len(ns)         return sum(sk*(nk+1) for sk, nk in zip(s,ns)) / (N+K)     fsns = f(s, ns)     return fsns - z*math.sqrt((f(s2, ns)- fsns**2)/(N+K+1))

For example, if an item has 60 five-stars, 80 four-stars, 75 three-stars, 20 two-stars and 25 one-stars, then its overall star rating would be about 3.4:

x = (60, 80, 75, 20, 25) starsort(x) # 3.3686975120774694

and you can sort a list of 5-star ratings with

sorted([(60, 80, 75, 20, 25), (10,0,0,0,0), (5,0,0,0,0)], key=starsort, reverse=True) # [(10, 0, 0, 0, 0), (60, 80, 75, 20, 25), (5, 0, 0, 0, 0)]

This shows the effect that more ratings can have upon the overall star value.

You'll find that this formula tends to give an overall rating which is a bit lower than the overall rating reported by sites such as Amazon, Ebay or Wal-mart particularly when there are few votes (say, less than 300). This reflects the higher uncertainy that comes with fewer votes. As the number of votes increases (into the thousands) all overall these rating formulas should tend to the (weighted) average rating.

Since the formula only depends on the frequency distribution of 5-star ratings for the item itself, it is easy to combine reviews from multiple sources (or, update the overall rating in light of new votes) by simply adding the frequency distributions together.

Unlike the IMDb formula, this formula does not depend on the average score across all items, nor an artificial minimum number of votes cutoff value.

Moreover, this formula makes use of the full frequency distribution -- not just the average number of stars and the number of votes. And it makes sense that it should since an item with ten 5-stars and ten 1-stars should be treated as having more uncertainty than (and therefore not rated as highly as) an item with twenty 3-star ratings:

In [78]: starsort((10,0,0,0,10)) Out[78]: 2.386028063783418  In [79]: starsort((0,0,20,0,0)) Out[79]: 2.795342687927806

The IMDb formula does not take this into account.

answered Oct 02 '22 00:10

unutbu

Related questions
                            
                                In Java how do you sort one list based on another?
                            
                                Sorting an almost sorted array (elements misplaced by no more than k)
                            
                                What's the difference of dual pivot quick sort and quick sort?
                            
                                Lazy Evaluation and Time Complexity
                            
                                python: iterate over dictionary sorted by key
                            
                                Sorting a php array of arrays by custom order
                            
                                Sorting strings with numbers in Bash [duplicate]
                            
                                I want to sort an array using NSSortDescriptor
                            
                                How to reverse order a vector?
                            
                                Sorting strings in descending order in Javascript (Most efficiently)?
                            
                                Case insensitive sorting in MongoDB
                            
                                sorting a vector of structs [duplicate]
                            
                                Sort one list by another
                            
                                Any way to extend javascript's array.sort() method to accept another parameter?
                            
                                Pythonic way to sorting list of namedtuples by field name
                            
                                Sort mixed alpha/numeric array
                            
                                Is there a simple way that I can sort characters in a string in alphabetical order
                            
                                functional non-destructive array sort
                            
                                In what order does os.walk iterates iterate? [duplicate]
                            
                                How to enable DataGridView sorting when user clicks on the column header?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is a better way to sort by a 5 star rating?

Tags:

sorting

statistics

user-experience

rating

bayesian

Vizjerai

People also ask

2 Answers

Martin Harris

unutbu

Recent Activity

Donate For Us