Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a good way to denormalize a mysql database?

Tags:

I have a large database of normalized order data that is becoming very slow to query for reporting. Many of the queries that I use in reports join five or six tables and are having to examine tens or hundreds of thousands of lines.

There are lots of queries and most have been optimized as much as possible to reduce server load and increase speed. I think it's time to start keeping a copy of the data in a denormalized format.

Any ideas on an approach? Should I start with a couple of my worst queries and go from there?

like image 448
Eric Goodwin Avatar asked Aug 15 '08 23:08

Eric Goodwin


2 Answers

I know more about mssql that mysql, but I don't think the number of joins or number of rows you are talking about should cause you too many problems with the correct indexes in place. Have you analyzed the query plan to see if you are missing any?

http://dev.mysql.com/doc/refman/5.0/en/explain.html

That being said, once you are satisifed with your indexes and have exhausted all other avenues, de-normalization might be the right answer. If you just have one or two queries that are problems, a manual approach is probably appropriate, whereas some sort of data warehousing tool might be better for creating a platform to develop data cubes.

Here's a site I found that touches on the subject:

http://www.meansandends.com/mysql-data-warehouse/?link_body%2Fbody=%7Bincl%3AAggregation%7D

Here's a simple technique that you can use to keep denormalizing queries simple, if you're just doing a few at a time (and I'm not replacing your OLTP tables, just creating a new one for reporting purposes). Let's say you have this query in your application:

select a.name, b.address from tbla a 
join tblb b on b.fk_a_id = a.id where a.id=1

You could create a denormalized table and populate with almost the same query:

create table tbl_ab (a_id, a_name, b_address); 
-- (types elided)

Notice the underscores match the table aliases you use

insert tbl_ab select a.id, a.name, b.address from tbla a
join tblb b on b.fk_a_id = a.id 
-- no where clause because you want everything

Then to fix your app to use the new denormalized table, switch the dots for underscores.

select a_name as name, b_address as address 
from tbl_ab where a_id = 1;

For huge queries this can save a lot of time and makes it clear where the data came from, and you can re-use the queries you already have.

Remember, I'm only advocating this as the last resort. I bet there's a few indexes that would help you. And when you de-normalize, don't forget to account for the extra space on your disks, and figure out when you will run the query to populate the new tables. This should probably be at night, or whenever activity is low. And the data in that table, of course, will never exactly be up to date.

[Yet another edit] Don't forget that the new tables you create need to be indexed too! The good part is that you can index to your heart's content and not worry about update lock contention, since aside from your bulk insert the table will only see selects.

like image 133
Eric Z Beard Avatar answered Sep 19 '22 07:09

Eric Z Beard


MySQL 5 does support views, which may be helpful in this scenario. It sounds like you've already done a lot of optimizing, but if not you can use MySQL's EXPLAIN syntax to see what indexes are actually being used and what is slowing down your queries.

As far as going about normalizing data (whether you're using views or just duplicating data in a more efficient manner), I think starting with the slowest queries and working your way through is a good approach to take.

like image 31
pix0r Avatar answered Sep 20 '22 07:09

pix0r