I have some pages on a website and I have to create an ordering based on "popularity"/"activity"
The parameters that I have to use are:
Are there any standards for what a formula for popularity would be? (if not opinions are good too)
(initially I thought of views + 10*comments + 10*likeit)
Actually there is an accepted best way to calculate this:
http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
You may need to combine 'likes' and 'comments' into a single score, assigning your own weighting factor to each, before plugging it into the formula as the 'positive vote' value.
from the link above:
Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter
We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the "real" fraction of positive ratings is at least what? Wilson gives the answer. Considering only positive and negative ratings (i.e. not a 5-star scale), the lower bound on the proportion of positive ratings is given by:
(Use minus where it says plus/minus to calculate the lower bound.) Here
p̂
is the observed fraction of positive ratings,zα/2
is the(1-α/2)
quantile of the standard normal distribution, andn
is the total number of ratings. The same formula implemented in Ruby:
require 'statistics2'
def ci_lower_bound(pos, n, confidence)
if n == 0
return 0
end
z = Statistics2.pnormaldist(1-(1-confidence)/2)
phat = 1.0*pos/n
(phat + z*z/(2*n) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
end
pos
is the number of positive ratings,n
is the total number of ratings, and confidence refers to the statistical confidence level: pick 0.95 to have a 95% chance that your lower bound is correct, 0.975 to have a 97.5% chance, etc. The z-score in this function never changes, so if you don't have a statistics package handy or if performance is an issue you can always hard-code a value here forz
. (Use 1.96 for a confidence level of 0.95.)
The same formula as an SQL query:
SELECT widget_id, ((positive + 1.9208) / (positive + negative) -
1.96 * SQRT((positive * negative) / (positive + negative) + 0.9604) /
(positive + negative)) / (1 + 3.8416 / (positive + negative))
AS ci_lower_bound FROM widgets WHERE positive + negative > 0
ORDER BY ci_lower_bound DESC;
There is no standard formula for this (how could there be?)
What you have looks like a fairly normal solution, and would probably work well. Of course, you should play around with the 10's to find values that suit your needs.
Depending on your requirements, you might also want to add in a time factor (i.e. -X points per week) so that old pages become less popular. Alternatively, you could change your "page views" to "page views in the last month". Again, this depends on your needs, it may not be relevant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With