Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL statistics tables optimization

Tags:

indexing

mysql

I need to create a table in MySQL version 5.5

this table will have information like:

  • user browsers (Firefox or chrome for example)
  • version of the browser (eg: 8.0 or 10)
  • IP of the user
  • date and time (when the user accessed the site)
  • referrer (URL or empty)

Here's what i think:

create table statistics (
 browser varchar(255) not null,
 version float not null,
 ip varchar(40) not null,
 dateandtime datetime,
 referrer varchar(255)
);

I read on mysql.com that I need to use indexes to make my query fast but now my problem is what index should I create in order to make that table fast to query?

I need to query all the fields eg:

  • I want to know from the last 7 days which browser came to our site and how many
  • I want to know today how many user I have
  • I want to know from the last hour what urls (referrer) we got

Thanks

like image 739
apollo Avatar asked Nov 25 '11 14:11

apollo


People also ask

How do I optimize a table in MySQL?

The MySQL OPTIMIZE table helps you to optimize the table storage space. It reorganizes the storage data in a way that increases the Input Output efficiency and reduces the storage space. To execute this statement, you need SELECT and INSERT privileges.

How do I optimize a large table in MySQL?

Remove any unnecessary indexes on the table, paying particular attention to UNIQUE indexes as these disable change buffering. Don't use a UNIQUE index unless you need it; instead, employ a regular INDEX. Take a look at your slow query log every week or two. Pick the slowest three queries and optimize those.

How often should I run optimize table MySQL?

In most setups, you need not run OPTIMIZE TABLE at all. Even if you do a lot of updates to variable-length rows, it is not likely that you need to do this more than once a week or month and only on certain tables. Based on this article on Table Optimization.

How long does MySQL optimize table take?

Optimizing table straight away takes over 3 hours, while dropping indexes besides primary key, optimizing table and adding them back takes about 10 minutes, which is close than 20x speed difference and more compact index in the end.


2 Answers

I would recommend this:

Use intergers instead of chars/varchars. this way you index faster (except the referrer). Also, I can recommend to get summary tables. Although it's not really normalized but the query will be executed instantly - specially if you have a big organization with lots of traffic.

So here's the tables:

create table statistics (
 browser tinyint(3) UNSIGNED not null default 0,
 version float(4,2) not null default 0,
 ip INT(10) UNSIGNED not null default 0,
 createdon datetime,
 referrer varchar(5000),
 key browserdate (browser, createdon),
 key ipdate (ip, createdon),
 // etc..
);

browser 0 = unknow, 1 = firefox etc.. This can be done in your code (so you load the same code for inserting and selecting). i dont use enum here because if you need to alter the table and you have millions of records this can be painful. new browser = new number in the code which is way faster to change.

this table can be used to resummarized all the other tables if something happens. so you create an index for the inline summary table (example browser)

Now the summary table:

create table statistics_browser_2011_11 (
 browser tinyint(3) UNSIGNED not null default 0,
 version float(4,2) not null default 0,
 number bigint(20) not null default 0,
 createdon datetime,
 unique key browserinfo (createdon, browser, version)
); // browsers stats for november 2011

This way when you inserts (you get the date of the user when he accessed the site and create a $string that match with the table name) into this table you only have to use the on duplicate key number = number +1. this way when you retrieve the browser statistics is super fast.

now here you will have to create a merge table because if you are the second of the month and you want to query the last 7 days, you will need the current month and the last month table. here's more info: http://dev.mysql.com/doc/refman/5.1/en/merge-storage-engine.html

and you repeat the process for the other information: ip, referrer etc...

in order to maintain these tables, you will have to create a cronjob that creates tables for the next month. simple PHP script that gets the current year/month and then create the table for the next month if it does not exists and then merge them)

this might be a little of work but this is how i do it at work (with similar data) with 12 terabytes of data and 5,000 employees that fetch the databases. my average load time for each query is approx 0.60 seconds per requests.

like image 51
Gabriel Avatar answered Nov 12 '22 22:11

Gabriel


I think your schema can be improved to

create table statistics
(
  browser enum('Firefox','IE','Opera','Chrome','Safari','Others') not null 
    default 'Others',
   // major browser family only
   // instead of using free-form of varchar

  user_agent text,
   // to store the complete user agents
   // mainly for reference purpose only

  version float not null,
  ip varchar(40) not null,

  dateandtime datetime not null,

  referer varchar(2000)
  // 255 is no sufficient for referer
);

Index key

  1. build an index on browser, datetime
  2. using enum will make browser GROUP BY faster
  3. if you need version information, then it will be browser, version, datetime
  4. composite key on datetime, browser

query 1

select browser, count(*) from statistics
where dateandtime between ? and ?
group by browser;

query 2

 select count(*) from statistics
 where dateandtime between ? and ?;

query 3

 select referer from statistics
 where dateandtime between ? and ?;
like image 28
ajreal Avatar answered Nov 13 '22 00:11

ajreal