Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Formula for popularity? (based on "like it", "comments", "views")

I have some pages on a website and I have to create an ordering based on "popularity"/"activity"

The parameters that I have to use are:

  • views to the page
  • comments made on the page (there is a form at the bottom where uses can make comments)
  • clicks made to the "like it" icon

Are there any standards for what a formula for popularity would be? (if not opinions are good too)

(initially I thought of views + 10*comments + 10*likeit)

like image 739
paullb Avatar asked Jun 09 '10 07:06

paullb


2 Answers

Actually there is an accepted best way to calculate this:
http://www.evanmiller.org/how-not-to-sort-by-average-rating.html

You may need to combine 'likes' and 'comments' into a single score, assigning your own weighting factor to each, before plugging it into the formula as the 'positive vote' value.

from the link above:

Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter

We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the "real" fraction of positive ratings is at least what? Wilson gives the answer. Considering only positive and negative ratings (i.e. not a 5-star scale), the lower bound on the proportion of positive ratings is given by: enter image description here

(Use minus where it says plus/minus to calculate the lower bound.) Here is the observed fraction of positive ratings, zα/2 is the (1-α/2) quantile of the standard normal distribution, and n is the total number of ratings. The same formula implemented in Ruby:

require 'statistics2'

def ci_lower_bound(pos, n, confidence)
    if n == 0
        return 0
    end
    z = Statistics2.pnormaldist(1-(1-confidence)/2)
    phat = 1.0*pos/n
    (phat + z*z/(2*n) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
end

pos is the number of positive ratings, n is the total number of ratings, and confidence refers to the statistical confidence level: pick 0.95 to have a 95% chance that your lower bound is correct, 0.975 to have a 97.5% chance, etc. The z-score in this function never changes, so if you don't have a statistics package handy or if performance is an issue you can always hard-code a value here for z. (Use 1.96 for a confidence level of 0.95.)

The same formula as an SQL query:

SELECT widget_id, ((positive + 1.9208) / (positive + negative) - 
                   1.96 * SQRT((positive * negative) / (positive + negative) + 0.9604) / 
                          (positive + negative)) / (1 + 3.8416 / (positive + negative)) 
       AS ci_lower_bound FROM widgets WHERE positive + negative > 0 
       ORDER BY ci_lower_bound DESC;
like image 176
Anentropic Avatar answered Nov 12 '22 21:11

Anentropic


There is no standard formula for this (how could there be?)

What you have looks like a fairly normal solution, and would probably work well. Of course, you should play around with the 10's to find values that suit your needs.

Depending on your requirements, you might also want to add in a time factor (i.e. -X points per week) so that old pages become less popular. Alternatively, you could change your "page views" to "page views in the last month". Again, this depends on your needs, it may not be relevant.

like image 38
Peter Alexander Avatar answered Nov 12 '22 21:11

Peter Alexander