I want to start counting the numbers of times a webpage is viewed and hence need some kind of simple counter. What is the best scalable method of doing this?
Suppose I have a table Frobs where each row corresponds to a page - some obvious options are:
Have an unsigned int NumViews field
in the Frobs table which gets
updated upon each view using UPDATE
Frobs SET NumViews = NumViews + 1
. Simple but not so good at scaling as I understand it.
Have a separate table FrobViews
where a new row is inserted for each view. To display the
number of views, you then need to do a simple SELECT COUNT(*) AS NumViews FROM FrobViews WHERE FrobId = '%d' GROUP BY FrobId
. This doesn't involve any updates so can avoid table locking in MyISAM tables - however, the read performance will suffer if you want to display the number of views on each page.
How do you do it?
There's some good advice here: http://www.mysqlperformanceblog.com/2007/07/01/implementing-efficient-counters-with-mysql/ but I'd like to hear the views of the SO community.
I'm using InnoDb at the moment, but am interested in answers for both InnoDb and MyISAM.
If scalability is more important to you than absolute accuracy of the figures then you could cache the view count in your application for a short time rather than hitting the database on every page view - eg, only update the database once every 100 views.
If your application crashes between database updates then obviously you'll lose some of your data, but if you can tolerate a certain amount of inaccuracy then this might be a useful approach.
Inserting into a Database is not something you want to do on page views. You are likely to run into problems with updating your slave databases with all of the inserts since replication is single threaded on MySQL.
At my company we serve 25M page views a day and we have taken a tiered approach.
The view counter is stored in a separate table with 2 columns (profileId, viewCounter) both are unsigned integers.
For items that are infrequently viewed we update the table on page view.
For frequently viewed items we update MySQL about 1/10 of the time. For both types we update Memcache on every hit.int Memcache::increment ( string $key [, int $value = 1 ] )
if (pageViews < 10000) { UPDATE page_view SET viewCounter=viewCounter+1 WHERE profileId = :? }
else if ((int)rand(10) == 1) { //UPDATE page_view SET viewCounter= ?:cache_value WHERE profileId = :? }
doing count(*) is very inefficient in InnoDB (MyISAM keeps count stats in the index), but MyISAM will lock the table on reads reducing concurrency. doing a count() for 50,000 or 100,000 rows is going to take a long time. Doing a select on a PK will be very fast.
If you require more scalability, you might want to look at redis
I would take your second approach and aggregate the data into the table from your first solution on a regular base. On this way you get the advandages of both solutions. To be clearer: On every hit you insert a row into a table (lets name it hit_counters). This table got only one field (the pageid). Every x seconds you run a script (via a cronjob) which aggregates the data from the hit_counters table and put it into a second table (lets name it 'hits'. There you got two fields: the pageid and the total hits.
Im not sure but imho does innodb not help you very much for solution 1 if you get many hits on the same page: Innodb locks the row while updating so all other updates to this row will be delayed.
Depending on whats your program written in you could also batch the updates together by counting in your application and updating the database only every x seconds. This would only work if you use a programing language where you got persistent storage (like Java Servlets but not PHP)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With