Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to track pageviews without thrashing the MySQL DB

Tags:

php

mysql

I am trying to track pageviews in MySQL DB using the following query:

"UPDATE $table SET pageviews = pageviews + 1 WHERE page_id = 1"

This is fine for low to moderate traffic. However, under high traffic, constant writes to the DB would result in high read/write contention and eventually bring down the DB.

I have read several QA's here on Stackoverflow and elsewhere, where MongoDB is suggested as an alternative. However, that choice ain't available and I must stick to MySQL. Furthermore, I do not have control over the Engine — MyISAM or InnoDB (InnoDB performs better due to row based locking instead of table, as in case of MyISAM).

Considering the above scenario, what's the best posible method to track pageviews without thrashing the DB (in DB or something else)? I would really appreciate an answer that provides code fragments as a starting point (if posible).

BTW, I am using PHP.

Update: @fire has a good solution here. However, it requires use of memcache. I am, looking at something that could be easily implemented without requiring specific infra. This is for a module that could virtually be used in different hosting environments. On a second thought things that comes to my mind are some sort of cookie or file log based implementation. I am not sure how such implementation would work in practice. Any further inputs are really welcome.

like image 952
John Avatar asked Nov 29 '12 15:11

John


2 Answers

I would use memcached to store the count, and then sync it with the database on a cron...

// Increment
$page_id = 1;
$memcache = new Memcache();
$memcache->connect('localhost', 11211);

if (!$memcache->get('page_' . $page_id)) {
    $memcache->set('page_' . $page_id, 1);
}
else {
    $memcache->increment('page_' . $page_id, 1);
}

// Cron
if ($pageviews = $memcache->get('page_' . $page_id)) {
    $sql = "UPDATE pages SET pageviews = pageviews + " . $pageviews . " WHERE page_id = " . $page_id;
    mysql_query($sql);
    $memcache->delete('page_' . $page_id);
}
like image 156
fire Avatar answered Nov 13 '22 01:11

fire


I'd consider gathering raw hits with the fastest writing engine you have available:

INSERT INTO hits (page_id, hit_date) VALUES (:page_id, CURRENT_TIMESTAMP)

... and then running a periodical process, possibly a cron command line script, that would count and store the page count summary you need in an hourly or daily basis:

INSERT INTO daily_stats (page_id, num_hits, day)
SELECT page_id, SUM(hit_id)
FROM hits
WHERE hit_date='2012-11-29'
GROUP BY page_id

(Queries are mere examples, tweak to your needs)

Another typical solution is good old log parsing, feeding a script like AWStats with your web server's logs.

Clarification: My first suggestion is fairly similar to @fire's but I didn't get into storage details. The key point is to delay heavy processing and just the minimum amount of raw info in the fastest way.

like image 24
Álvaro González Avatar answered Nov 12 '22 23:11

Álvaro González