Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hot content algorithm / score with time decay

I have been reading + researching on algorithms and formulas to work out a score for my user submitted content to display currently hot / trending items higher up the list, however i'll admit i'm a little over my head here.

I'll give some background on what i'm after... users upload audio to my site, audios have several actions:

  • Played
  • Downloaded
  • Liked
  • Favorited

Ideally i want an algorithm where I can update an audios score each time a new activity is logged (played, download etc...), also a download action is worth more than a play, like more than a download and a favourite more than a like.

If possible i would like for audios older than 1 week to drop off quite sharply from the list to give newer content more of a chance of trending.

I have read about reddits algorithm which looked good, but i'm in over my head on how to tweak it to make use of my multiple variables, and to drop off older articles after around 7 days.

Some articles that we're interesting:

  • https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9 (reddits algo)
  • http://www.evanmiller.org/rank-hotness-with-newtons-law-of-cooling.html

Any help is appreciated!

Paul

like image 542
Paul Hinett Avatar asked Jul 25 '12 15:07

Paul Hinett


1 Answers

Reddits old formula and a little drop off

Basically you can use Reddit's formula. Since your system only supports upvotes you could weight them, resulting in something like this:

def hotness(track)     s = track.playedCount     s = s + 2*track.downloadCount     s = s + 3*track.likeCount     s = s + 4*track.favCount     baseScore = log(max(s,1))      timeDiff = (now - track.uploaded).toWeeks      if(timeDiff > 1)         x = timeDiff - 1         baseScore = baseScore * exp(-8*x*x)      return baseScore 

The factor exp(-8*x*x) will give you your desired drop off:

Exponential drop off

The basics behind

You can use any function that goes to zero faster than your score goes up. Since we use log on our score, even a linear function can get multiplied (as long as your score doesn't grow exponentially).

So all you need is a function that returns 1 as long as you don't want to modify the score, and drops afterwards. Our example above forms that function:

multiplier(x) = x > 1 ? exp(-8*x*x) : 1 

You can vary the multiplier if you want less steep curves. varying multiplier

Example in C++

Lets say that the probability for a given track to be played in a given hour is 50%, download 10%, like 1% and favorite 0.1%. Then the following C++ program will give you an estimate for your scores behavior:

#include <iostream> #include <fstream> #include <random> #include <ctime> #include <cmath>  struct track{     track() : uploadTime(0),playCount(0),downCount(0),likeCount(0),faveCount(0){}     std::time_t uploadTime;         unsigned int playCount;     unsigned int downCount;     unsigned int likeCount;     unsigned int faveCount;         void addPlay(unsigned int n = 1){ playCount += n;}     void addDown(unsigned int n = 1){ downCount += n;}     void addLike(unsigned int n = 1){ likeCount += n;}     void addFave(unsigned int n = 1){ faveCount += n;}     unsigned int baseScore(){         return  playCount +             2 * downCount +             3 * likeCount +             4 * faveCount;     } };  int main(){     track test;     const unsigned int dayLength = 24 * 3600;     const unsigned int weekLength = dayLength * 7;          std::mt19937 gen(std::time(0));     std::bernoulli_distribution playProb(0.5);     std::bernoulli_distribution downProb(0.1);     std::bernoulli_distribution likeProb(0.01);     std::bernoulli_distribution faveProb(0.001);      std::ofstream fakeRecord("fakeRecord.dat");     std::ofstream fakeRecordDecay("fakeRecordDecay.dat");     for(unsigned int i = 0; i < weekLength * 3; i += 3600){         test.addPlay(playProb(gen));         test.addDown(downProb(gen));         test.addLike(likeProb(gen));         test.addFave(faveProb(gen));              double baseScore = std::log(std::max<unsigned int>(1,test.baseScore()));         double timePoint = static_cast<double>(i)/weekLength;                  fakeRecord << timePoint << " " << baseScore << std::endl;         if(timePoint > 1){             double x = timePoint - 1;             fakeRecordDecay << timePoint << " " << (baseScore * std::exp(-8*x*x)) << std::endl;         }         else             fakeRecordDecay << timePoint << " " << baseScore << std::endl;     }     return 0; } 

Result:

Decay

This should be sufficient for you.

like image 133
Zeta Avatar answered Nov 09 '22 21:11

Zeta