Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalize or Denormalize in high traffic websites

What are the best practices for database design and normalization for high traffic websites like stackoverflow?

Should one use a normalized database for record keeping or a normalized technique or a combination of both?

Is it sensible to design a normalized database as the main database for record keeping to reduce redundancy and at the same time maintain another denormalized form of the database for fast searching?

or

Should the main database be denormalized but with normalized views at the application level for fast database operations?

or some other approach?

like image 453
Inam Jameel Avatar asked Aug 01 '09 19:08

Inam Jameel


2 Answers

The performance hit of joining is frequently overestimated. Database products like Oracle are built to join very efficiently. Joins are often regarded as performing badly when the real culprit is a poor data model or a poor indexing strategy. People also forget that denormalised databases perform very badly when it comes to inserting or updating data.

The key thing to bear in mind is the type of application you're building. Most of the famous websites are not like regular enterprise applications. That's why Google, Facebook, etc don't use relational databases. There's been a lot of discussion of this topic recently, which I have blogged about.

So if you're building a website which is primarily about delivering shedloads of semi-structured content you probably don't want to be using a relational database, denormalised or otherwise. But if you're building a highly transactional website (such as an online bank) you need a design which guarantees data security and integrity, and does so well. That means a relational database in at least third normal form.

like image 103
APC Avatar answered Sep 23 '22 19:09

APC


Denormalizing the db to reduce the number of joins needed for intense queries is one of many different ways of scaling. Having to do fewer joins means less heavy lifting by the db, and disk is cheap.

That said, for ridiculous amounts of traffic good relational db performance can be hard to achieve. That is why many bigger sites use key value stores(e.g. memcached) and other caching mechanisms.

The Art of Capacity Planning is pretty good.

like image 30
BaroqueBobcat Avatar answered Sep 21 '22 19:09

BaroqueBobcat