I am building an advanced image sharing web application. As you may expect, users can upload images and others can comments on it, vote on it, and favorite it. These events will determine the popularity of the image, which I capture in a "karma" field.
Now I want to create a Digg-like homepage system, showing the most popular images. It's easy, since I already have the weighted Karma score. I just sort on that descendingly to show the 20 most valued images.
The part that is missing is time. I do not want extremely popular images to always be on the homepage. I guess an easy solution is to restrict the result set to the last 24 hours. However, I'm also thinking that in order to keep the image rotation occur throughout the day, time can be some kind of variable where its offset has an influence on the image's sorting.
Specific questions:
I'm not asking the community to build this algorithm, just looking for some advise :)
I would go with a function that decreases the "effective karma" of each item after a given amount of time elapses. This is a bit like Eric's method.
Determine how often you want the "effective karma" to be decreased. Then multiply the karma by a scaling factor based on this period.
effective karma = karma * (1 - percentage_decrease)
where percentage_decrease
is determined by yourfunction. For instance, you could do
percentage_decrease = min(1, number_of_hours_since_posting / 24)
to make it so the effective karma of each item decreases to 0 over 24 hours. Then use the effective karma to determine what images to show. This is a bit more of a stable solution than just subtracting the time since posting, as it scales the karma between 0 and its actual value. The min is to keep the scaling at a 0 lower bound, as once a day passes, you'll start getting values greater than 1.
However, this doesn't take into account popularity in the strict sense. Tim's answer gives some ideas into how to take strict popularity (i.e. page views) into account.
For your first question, I would go with the slightly more complicated method. You will want some "All time favorites" in the mix. But don't go by time alone, go by the number of actual views the image has. Keep in mind that not everyone is going to login and vote, but that doesn't make the image any less popular. An image that is two years old with 10 votes and 100k views is obviously more important to people than an image that is 1 year old with 100 votes and 1k views.
For your second question, yes, you want some kind of caching going on in your front page. That's a lot of queries to produce the entry point into your site. However, much like SO, your type of site will tend to draw traffic to inner pages through search engines .. so try and watch / optimize your queries everywhere.
For your third question, going by factors other than time (i.e. # of views) helps to make sure you always have a full and dynamic page. I'm not sure about paginating on the front page, leading people to tags or searches might be a better strategy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With