I'm working no a site which stores individual page views in a 'views' table:
CREATE TABLE `views` (
`view_id` bigint(16) NOT NULL auto_increment,
`user_id` int(10) NOT NULL,
`user_ip` varchar(15) NOT NULL,
`view_url` varchar(255) NOT NULL,
`view_referrer` varchar(255) NOT NULL,
`view_date` date NOT NULL,
`view_created` int(10) NOT NULL,
PRIMARY KEY (`view_id`),
KEY `view_url` (`view_url`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
It's pretty basic, stores user_id (the user's id on the site), their IP address, the url (without the domain to reduce the size of the table a little), the referral url (not really using that right now and might get rid of it), the date (YYYY-MM-DD format of course), and the unix timestamp of when the view occurred.
The table, of course, is getting rather big (4 million rows at the moment and it's a rather young site) and running queries on it are slow.
For some basic optimization I've now created a 'views_archive' table:
CREATE TABLE `views_archive` (
`archive_id` bigint(16) NOT NULL auto_increment,
`view_url` varchar(255) NOT NULL,
`view_count` smallint(5) NOT NULL,
`view_date` date NOT NULL,
PRIMARY KEY (`archive_id`),
KEY `view_url` (`view_url`),
KEY `view_date` (`view_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
This ignores the user info (and referral url) and stores how many times a url was viewed per day. This is probably how we'll generally want to use the data (how many times a page was viewed on a per day basis) so should make querying pretty quick, but even if I use it to mainly replace the 'views' table (right now I imagine I could show page views by hour for the last week/month or so and then show daily views beyond that and so would only need the 'views' table to contain data from the last week/month) but it's still a large table.
Anyway, long story short, I'm wondering if you can give me any tips on how to best handle the storage of stats/page views in a MySQL site, the goal being to both keep the size of the table(s) in the db as small as possible and still be able to easily (and at least relatively quickly) query the info. I've looked at partitioned tables a little, but the site doesn't have MySQL 5.1 installed. Any other tips or thoughts you could offer would be much appreciated.
Views in MySQL are generally a bad idea. At Grooveshark we consider them to be harmful and always avoid them. If you are careful you can make them work but at best they are a way to remember how to select data or keep you from having to retype complicated joins.
Yes, Views automatically update in MySQL; including, but not limited to: Changing table structures. Insert/Update/Delete procedures on Tables. Changing View structures using CREATE OR REPLACE VIEW.
Views should be used when:Simplifying complex queries (like IF ELSE and JOIN or working with triggers and such) Putting extra layer of security and limit or restrict data access (since views are merely virtual tables, can be set to be read-only to specific set of DB users and restrict INSERT )
You probably want to have a table just for pages, and have the user views have a reference to that table. Another possible optimization would be to have the user IP stored in a different table, perhaps some session table information. That should reduce your query times somewhat. You're on the right track with the archive table; the same optimizations should help that as well.
MySQL's Archive Storage Engine
http://dev.mysql.com/tech-resources/articles/storage-engine.html
It is great for logs, it is quick to write, the one downside is reading is a bit slower. but it is great for log tables.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With