I was wondering how to best implement a "most viewed" featured (like youtube) on my database.
Let me explain the "most viewed" feature a little bit better: Basically I want to list the most visited pages/video/etc from this day/week/month, see http://www.youtube.com/charts/videos_views for an example.
So I was wondering how to best implement this feature as I can think of many many ways of doing it but all of them have their + and - to them.
Plus I also would love to hear the comments of various programmers on others programmers ideas. Also i would like to start a good conversation on this topic.
Ps. I'm looking specially on how to calculate the time, say most viewed in this month, without having a huge table saving every single view with the datetime. Any idea is welcome.
Pps. I use Mysql and PHP, extra tips for those two are very welcome.
have the following tables : 1. views 2. views_hourly_summary 3. views_daily_summary 4. views_monthly_summary 5. views_alltime_summary
have a cron job run at the following intervals:
run every hour and preaggregate the views for that hour from the views table and save the preaggregated result in the views_hourly_summary table, also update the views_alltime_summary table
run at the end of everyday and preaggregate the views for that day from the hours table and save the preaggregated result in the views_daily_summary table
run at the end of everymonth and preaggregate the views for that day from the hours table and save the preaggregated result in the views_daily_summary table
next when fetching results you will have to do some math as follows:
For example, you want to fetch the views for last 4 hours, you would fetch the 3 whole hours data from the hourly table and for the remaining data fetch it from the views table as follows:
select item_id, sum(views) as views from views_hourly_summary where hour between concat(left(now() - interval 3 hour, 14), '00:00') and concat(left(now(), 14), '00:00') group by item_id
union
select item_id, count(1) as views from views where datetime between (now() - interval 4 hour) and concat(left(now() - interval 3 hour, 14), '00:00') or datetime > concat(left(now(), 14), '00:00') group by item_id
First I'd try to track users by using a cookie, giving them a unique ID as a visit and for that visit. That should help with identification.
I'd try to move the logic to update the most viewed item(s) out of the website codebase. So this just means each web request posts the visit info to a message queue - probably including the time/date, item being viewed, the tracked info from the cookie and maybe IP address / request header.
I'd then set up a service to read from that queue and process the information. That would allow you to upgrade / modify that logic without affecting the site. You'll probably want to handle multiple refreshes, etc to stop people messing with the results - either delibrately or by accident. Also by abstracting this processing out of the main site you're not slowing down your page request time by - you can make the logic to determine / update item views as complex as you like then.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With