Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

About Youtube views count

I'm implementing an app that keeps track of how many times a post is viewed. But I'd like to keep a 'smart' way of keeping track. This means, I don't want to increase the view counter just because a user refreshes his browser.

So I decided to only increase the view counter if IP and user agent (browser) are unique. Which is working so far.

But then I thought. If Youtube, is doing it this way, and they have several videos with thousands or even millions of views. This would mean that their views table in the database would be overly populated with IP's and user agents....

Which brings me to the assumption that their video table has a counter cache for views (i.e. views_count). This means, when a user clicks on a video, the IP and user agent is stored. Plus, the counter cache column in the video table is increased.

Every time a video is clicked. Youtube would need to query the views table and count the number of entries. Won't this affect performance drastically?

Is this how they do it? Or is there a better way?

like image 361
Christian Fazzini Avatar asked Sep 28 '11 19:09

Christian Fazzini


3 Answers

I would leverage client side browser fingerprinting to uniquely identify view counts. This library seems to be getting significant traction:

https://github.com/Valve/fingerprintJS

I would also recommend using Redis for anything to do with counts. It's atomic increment commands are easy to use and guarantee your counts never get messed up via race conditions.

This would be the command you would want to use for incrementing your counters:

http://redis.io/commands/incr

The key in this case would be the browser fingerprint hash sent to you from the client. You could then have a Redis "set" that would contain a list of all browser fingerprints known to be associated with a given user_id (the key for the set would be the user_id).

Finally, if you really need to, you run a cron job or other async process that dumps the view counts for each user into your counter cache field for your relational database.

You could also take the approach where you store user_id, browser fingerprint, and timestamps in a relational database (mysql?) and counter cache them into your user table periodically (probably via cron).

like image 174
Jarrod Spillers Avatar answered Oct 05 '22 23:10

Jarrod Spillers


First of all, afaik, youtube uses BigTable, so do not worry about querying the count, we don't know the exact structure of the database anyway.

Assuming that you are on a relational model, create a column view_count, but do not update it on every refresh. Record the visists and periodically update the cache.

Also, you can generate hash from IP, browser, date and any other information you are using to detect if this is an unique view, and do not store the whole data.

Also, you can use session/cookie to record the view being viewed. Since it will expire, it won't be such memory problem - I don't believe anyone is viewing thousand of videos in one session

like image 31
Maxim Krizhanovsky Avatar answered Oct 06 '22 00:10

Maxim Krizhanovsky


If you want to store all the IP's and browsers, then make sure you have enough DB storage space, add an index and that's it. If not, then you can use the rails session to store the list of videos that a user has visited, and only increment the view_count attribute of a video when he's visiting a new video.

like image 39
alf Avatar answered Oct 06 '22 01:10

alf