I'm looking for an algorithm to sort website results by popularity.. like Reddit's so the older a post the less power it's votes/score has.
Here is the generally accepted solution as used by reddit:
t = (time of entry post) - (Dec 8, 2005)
x = upvotes - downvotes
y = {1 if x > 0, 0 if x = 0, -1 if x < 0)
z = {1 if x < 1, otherwise x}
rank = log(z) + (y * t)/45000
I've been over Reddit's algorithm and although it will fit for one situation, what I really need is two algorithms, one for Popular posts and another for Upcoming posts:
Popular will decay slower, giving more weight to slightly older posts where Upcoming posts will focus more on popular posts today, dropping off sharply after N hours/days/etc.
I'm writing this using Sphinx expressions so I can't write a hugly complicated algo and I only have access to the following functions:
http://sphinxsearch.com/docs/current.html#numeric-functions
So I have the following data per post:
Here is my current solution:
Exponent = 0.01 (Popular), 0.5 (Upcoming)
SecondsSincePublised = abs(CurTimeInSecondsSinceDate-PubTimeInSecondsSinceDate)
Rank = (log10(PostScore)*10000) / pow(SecondsSincePublised,Exponent)
Although this solution does work its not ideal. A new and popular post in the last couple of hours often ranks high in both popular and upcoming which is not really what I want.
Can anyone suggest another algorithm that I can modify an exponent component to adjust the decay?
Have you tried ranking algorithm used by Hacker news? It is simple to implement.
Score = (P-1) / (T+2)^G
where,
P = points of an item (and -1 is to negate submitters vote)
T = time since submission (in hours)
G = Gravity, defaults to 1.8 in news.arc
You can vary Gravity to adjust the decay.
For more information refer to How Hacker News ranking algorithm works
Did you try using different decay functions for "popular" and "upcoming"? for example, use an exponential decay rate for "upcoming" and polynomial decay rate for "popular", this way, after a few hours (if optimised correctly), there's a very little chance a post will score high on upcoming. while in polynomial decay functions the relation between adjacent times are getting smaller, this is not the case with an exponential decay function.
Here's an example (the parameter 0.01 and 1.0005 are arbitrary and should be optimised according to your goal).
Popular:
SecondsSincePublised = abs(CurTimeInSecondsSinceDate-PubTimeInSecondsSinceDate)
Rank = (log10(PostScore)*10000) / pow(SecondsSincePublised,0.01)
Upcoming:
SecondsSincePublised = abs(CurTimeInSecondsSinceDate-PubTimeInSecondsSinceDate)
Rank = (log10(PostScore)*10000) / pow(1.0005,SecondsSincePublised)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With