Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementation of "Most Viewed" feature database

I was wondering how to best implement a "most viewed" featured (like youtube) on my database.

Let me explain the "most viewed" feature a little bit better: Basically I want to list the most visited pages/video/etc from this day/week/month, see http://www.youtube.com/charts/videos_views for an example.

So I was wondering how to best implement this feature as I can think of many many ways of doing it but all of them have their + and - to them.

Plus I also would love to hear the comments of various programmers on others programmers ideas. Also i would like to start a good conversation on this topic.

Ps. I'm looking specially on how to calculate the time, say most viewed in this month, without having a huge table saving every single view with the datetime. Any idea is welcome.
Pps. I use Mysql and PHP, extra tips for those two are very welcome.

like image 299
StfnoPad Avatar asked Sep 01 '10 09:09

StfnoPad


2 Answers

have the following tables : 1. views 2. views_hourly_summary 3. views_daily_summary 4. views_monthly_summary 5. views_alltime_summary

have a cron job run at the following intervals:

  1. run every hour and preaggregate the views for that hour from the views table and save the preaggregated result in the views_hourly_summary table, also update the views_alltime_summary table

  2. run at the end of everyday and preaggregate the views for that day from the hours table and save the preaggregated result in the views_daily_summary table

  3. run at the end of everymonth and preaggregate the views for that day from the hours table and save the preaggregated result in the views_daily_summary table

next when fetching results you will have to do some math as follows:

  1. For example, you want to fetch the views for last 4 hours, you would fetch the 3 whole hours data from the hourly table and for the remaining data fetch it from the views table as follows:

    select item_id, sum(views) as views from views_hourly_summary where hour between concat(left(now() - interval 3 hour, 14), '00:00') and concat(left(now(), 14), '00:00') group by item_id

    union

    select item_id, count(1) as views from views where datetime between (now() - interval 4 hour) and concat(left(now() - interval 3 hour, 14), '00:00') or datetime > concat(left(now(), 14), '00:00') group by item_id

like image 149
ovais.tariq Avatar answered Oct 05 '22 23:10

ovais.tariq


First I'd try to track users by using a cookie, giving them a unique ID as a visit and for that visit. That should help with identification.

I'd try to move the logic to update the most viewed item(s) out of the website codebase. So this just means each web request posts the visit info to a message queue - probably including the time/date, item being viewed, the tracked info from the cookie and maybe IP address / request header.

I'd then set up a service to read from that queue and process the information. That would allow you to upgrade / modify that logic without affecting the site. You'll probably want to handle multiple refreshes, etc to stop people messing with the results - either delibrately or by accident. Also by abstracting this processing out of the main site you're not slowing down your page request time by - you can make the logic to determine / update item views as complex as you like then.

like image 37
Paul Hadfield Avatar answered Oct 06 '22 00:10

Paul Hadfield