Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL solution for 1 million clicks / day [closed]

We are running a custom OpenX ad server on a MySQL database which gets approx. 1 million clicks / day. We need to store all this click information and show statistics based on it.

Right now, all the click information is aggregated every 2 days and the specific click info is deleted. But we want to provide a our affiliates with a new feature which will allow them to set a dynamic tracking id (TID) and, basically, track their clicks and conversions based on this.

So, the problem is that our click table will grow by a minimum of 1 million entries a day, and we need to be able to search this table and show all the clicks for one user for a specific period of time, grouped by the TID I mentioned above, or search by the TID.

I had a look at MySQL partitioning and it seems like a good solution, but, I'm not sure if it will still work well on a HUGE database (maybe billions of entries).

What do you think would be the correct approach for this issue?

EDIT:

Based on your answers, I'm now thinking of a mixed solution.

We already have a "LIVE" table from which the entries are deleted when the clicks are aggregated at maintenance time, which looks something like this:

Table: clicks

viewer_id | ... | date_time | affiliate_id | ... | tid

(I skipped the columns which are unimportant at this point)

At maintenance time, I can move everything to another monthly table which looks almost the same, say Table: clicks_2012_11, which has indexes for date_time, affiliate_id and tid and is partitioned by the affiliate_id.

So now, when an affiliate wants to see his statistics for the past 2 months, I know I have to look inside the Table: clicks_2012_10 and the Table: clicks_2012_11 (I will have the time range limited to a maximum of 2 months). Because I have the tables partitioned by affiliate_id, only the needed partitions will be searched from the 2 tables and I can now list all the TIDs which had any activity in the past 2 months.

What do you think about this approach? Are there any obvious issues? Am I over complicating things without a solid reason?

like image 969
user1782560 Avatar asked Oct 29 '12 11:10

user1782560


2 Answers

There is nothing inherent in big (even "huge") tables that makes MySQL fail. Big tables are mostly a problem in terms of:

  • disk space
  • cache usage (you are likely not to be able to run in memory)
  • maintenance (schema changes, rebuilds, ...)

You need to address all of these.

Partitioning is mainly useful for bulk data maintenance such as dropping entire partitions. It is certainly not a best-practice to partition big tables by default on just some column. Partitioning is always introduced for a specific reason.

like image 98
usr Avatar answered Sep 24 '22 03:09

usr


Optimizing for insertion and optimizing for retrieval are usually mutually exclusive. You might be better off with two tables:

live data: no (or minimal) keys, myisam to remove transaction overhead, etc...
historical data: indexed up the wazoo, with data moved over from the live data on a periodic basis.
like image 29
Marc B Avatar answered Sep 23 '22 03:09

Marc B