Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build a proper Database for a traffic analytics system?

Tags:

php

mysql

How to build a proper structure for an analytics service? Currently i have 1 table that stores data about every user that visits the page with my client's ID so later my clients will be able to see the statistics for a specific date.

I've thought a bit today and I'm wondering: Let's say i have 1,000 users and everyone has around 1,000 impressions on their sites daily, means i get 1,000,000 (1M) new records every day to a single table. How will it work after 2 months or so (when the table reaches 60 Million records)?

I just think that after some time it will have so much records that the PHP queries to pull out the data will be really heavy, slow and take a lot of resources, is it true? and how to prevent that?

A friend of mine working on something similar and he is gonna make a new table for every client, is this the correct way to go with?

Thanks!

like image 293
Ricardo Avatar asked Dec 20 '11 12:12

Ricardo


2 Answers

Problem you are facing is I/O bound system. 1 million records a day is roughly 12 write queries per second. That's achievable, but pulling the data out while writing at the same time will make your system to be bound at the HDD level.

What you need to do is configure your database to support the I/O volume you'll be doing, such as - use appropriate database engine (InnoDB and not MyISAM), make sure you have fast enough HDD subsystem (RAID, not regular drives since they can and will fail at some point), design your database optimally, inspect queries with EXPLAIN to see where you might have gone wrong with them, maybe even use a different storage engine - personally, I'd use TokuDB if I were you.

And also, I sincerely hope you'd be doing your querying, sorting, filtering on the database side and not on PHP side.

like image 171
N.B. Avatar answered Nov 02 '22 00:11

N.B.


Consider this Link to the Google Analytics Platform Components Overview page and pay special attention to the way the data is written to the database, simply based on the architecture of the entire system.

Instead of writing everything to your database right away, you could write everything to a log file, then process the log later (perhaps at a time when the traffic isn't so high). At the end of the day, you'll still need to make all of those writes to your database, but if you batch them together and do them when that kind of load is more tolerable, your system will scale a lot better.

like image 45
linuxeasy Avatar answered Nov 01 '22 23:11

linuxeasy