Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In what way does denormalization improve database performance?

I heard a lot about denormalization which was made to improve performance of certain application. But I've never tried to do anything related.

So, I'm just curious, which places in normalized DB makes performance worse or in other words, what are denormalization principles?

How can I use this technique if I need to improve performance?

like image 247
Roman Avatar asked Feb 27 '10 22:02

Roman


People also ask

How can the denormalization process improve relational database performance?

Denormalization is the process of adding precomputed redundant data to an otherwise normalized relational database to improve read performance of the database. Normalizing a database involves removing redundancy so only a single copy exists of each piece of information.

What is the benefit of denormalization?

Advantages of DenormalizationReducing the number of tables. Queries to be retrieved can be simpler. Less likely to have bugs. Precomputing derived values.

What is database denormalization and why is it important?

Database denormalization is a technique used to improve data access performances. When a database is normalized, and methods such as indexing are not enough, denormalization serves as one of the final options to speed up data retrieval.

How do you improve the performance of a database table?

You should try to put tables and/or indexes that are large and frequently used on different physical drives, if possible. If you have any very large tables, you might think of partitioning them. If you're still having performance problems, denormalization can sometimes help - but it all depends on the situation.


1 Answers

Denormalization is generally used to either:

  • Avoid a certain number of queries
  • Remove some joins

The basic idea of denormalization is that you'll add redundant data, or group some, to be able to get those data more easily -- at a smaller cost; which is better for performances.


A quick examples?

  • Consider a "Posts" and a "Comments" table, for a blog
    • For each Post, you'll have several lines in the "Comment" table
    • This means that to display a list of posts with the associated number of comments, you'll have to:
      • Do one query to list the posts
      • Do one query per post to count how many comments it has (Yes, those can be merged into only one, to get the number for all posts at once)
      • Which means several queries.
  • Now, if you add a "number of comments" field into the Posts table:
    • You only need one query to list the posts
    • And no need to query the Comments table: the number of comments are already de-normalized to the Posts table.
    • And only one query that returns one more field is better than more queries.

Now, there are some costs, yes:

  • First, this costs some place on both disk and in memory, as you have some redundant informations:
    • The number of comments are stored in the Posts table
    • And you can also find those number counting on the Comments table
  • Second, each time someone adds/removes a comment, you have to:
    • Save/delete the comment, of course
    • But also, update the corresponding number in the Posts table.
    • But, if your blog has a lot more people reading than writing comments, this is probably not so bad.
like image 80
Pascal MARTIN Avatar answered Sep 23 '22 00:09

Pascal MARTIN