I have been reading + researching on algorithms and formulas to work out a score for my user submitted content to display currently hot / trending items higher up the list, however i'll admit i'm a little over my head here.
I'll give some background on what i'm after... users upload audio to my site, audios have several actions:
Ideally i want an algorithm where I can update an audios score each time a new activity is logged (played, download etc...), also a download action is worth more than a play, like more than a download and a favourite more than a like.
If possible i would like for audios older than 1 week to drop off quite sharply from the list to give newer content more of a chance of trending.
I have read about reddits algorithm which looked good, but i'm in over my head on how to tweak it to make use of my multiple variables, and to drop off older articles after around 7 days.
Some articles that we're interesting:
Any help is appreciated!
Paul
Basically you can use Reddit's formula. Since your system only supports upvotes you could weight them, resulting in something like this:
def hotness(track) s = track.playedCount s = s + 2*track.downloadCount s = s + 3*track.likeCount s = s + 4*track.favCount baseScore = log(max(s,1)) timeDiff = (now - track.uploaded).toWeeks if(timeDiff > 1) x = timeDiff - 1 baseScore = baseScore * exp(-8*x*x) return baseScore
The factor exp(-8*x*x)
will give you your desired drop off:
You can use any function that goes to zero faster than your score goes up. Since we use log
on our score, even a linear function can get multiplied (as long as your score doesn't grow exponentially).
So all you need is a function that returns 1
as long as you don't want to modify the score, and drops afterwards. Our example above forms that function:
multiplier(x) = x > 1 ? exp(-8*x*x) : 1
You can vary the multiplier if you want less steep curves.
Lets say that the probability for a given track to be played in a given hour is 50%, download 10%, like 1% and favorite 0.1%. Then the following C++ program will give you an estimate for your scores behavior:
#include <iostream> #include <fstream> #include <random> #include <ctime> #include <cmath> struct track{ track() : uploadTime(0),playCount(0),downCount(0),likeCount(0),faveCount(0){} std::time_t uploadTime; unsigned int playCount; unsigned int downCount; unsigned int likeCount; unsigned int faveCount; void addPlay(unsigned int n = 1){ playCount += n;} void addDown(unsigned int n = 1){ downCount += n;} void addLike(unsigned int n = 1){ likeCount += n;} void addFave(unsigned int n = 1){ faveCount += n;} unsigned int baseScore(){ return playCount + 2 * downCount + 3 * likeCount + 4 * faveCount; } }; int main(){ track test; const unsigned int dayLength = 24 * 3600; const unsigned int weekLength = dayLength * 7; std::mt19937 gen(std::time(0)); std::bernoulli_distribution playProb(0.5); std::bernoulli_distribution downProb(0.1); std::bernoulli_distribution likeProb(0.01); std::bernoulli_distribution faveProb(0.001); std::ofstream fakeRecord("fakeRecord.dat"); std::ofstream fakeRecordDecay("fakeRecordDecay.dat"); for(unsigned int i = 0; i < weekLength * 3; i += 3600){ test.addPlay(playProb(gen)); test.addDown(downProb(gen)); test.addLike(likeProb(gen)); test.addFave(faveProb(gen)); double baseScore = std::log(std::max<unsigned int>(1,test.baseScore())); double timePoint = static_cast<double>(i)/weekLength; fakeRecord << timePoint << " " << baseScore << std::endl; if(timePoint > 1){ double x = timePoint - 1; fakeRecordDecay << timePoint << " " << (baseScore * std::exp(-8*x*x)) << std::endl; } else fakeRecordDecay << timePoint << " " << baseScore << std::endl; } return 0; }
Result:
This should be sufficient for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With