Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does denormalizing rows to columns enhance performance in SQL Server?

I have data which is a matrix of integer values which indicate a banded distribution curve. I'm optimizing for SELECT performance over INSERT performance. There are max 100 bands. I'll primarily be querying this data by summing or averaging bands across a period of time.

My question is can I achieve better performance by flattening this data across a table with 1 column for each band, or by using a single column representing the band value?

Flattened data

UserId ActivityId DateValue Band1 Band2 Band3....Band100
10001  10002      1/1/2013  1     5     100      200

OR Normalized

UserId ActivityId DateValue Band BandValue
10001  10002      1/1/2013  1    1
10001  10002      1/1/2013  2    5
10001  10002      1/1/2013  3    100

Sample query

SELECT AVG(Band1), AVG(Band2), AVG(Band3)...AVG(Band100)
FROM ActivityBands
GROUP BY UserId
WHERE DateValue > '1/1/2012' AND DateValue < '1/1/2013'
like image 292
Chris Lukic Avatar asked May 14 '13 02:05

Chris Lukic


People also ask

Does denormalization improve performance?

Denormalization can improve performance by: Minimizing the need for joins. Precomputing aggregate values, that is, computing them at data modification time, rather than at select time. Reducing the number of tables, in some cases.

What is the drawback of denormalization?

Disadvantages of DenormalizationUpdates and inserts are more expensive. If a piece of data is updated in one table, all values duplicated in other tables need to be updated as well.

Why would you Denormalize a database?

Denormalization is the process of adding precomputed redundant data to an otherwise normalized relational database to improve read performance of the database.


3 Answers

If you are accessing all (or most) of the bands in each row, then the denormalized form is better. Much better in my experience.

The reason is simple. The size of the data in the pages is much smaller, so many fewer pages need to be read to satisfy the query. The overhead for storing one band per row is about 4 integers or 32 bytes. So, 100 bands is about 3200 bytes. Within a single record, the record size is 100*4+8 or about 408 bytes. If your query is reading a significant number of records, this reduces the I/O requirements, significantly.

There is a caveat. If you only are reading one records worth, then 100 records fit on a single page in SQL and one record fits on a single page. The I/O for a single page read could be identical in the two cases. The benefit arises are you read more and more data.

Your sample query is reading hundreds or thousands of rows, so denormalization should benefit such a query.

like image 25
Gordon Linoff Avatar answered Sep 21 '22 00:09

Gordon Linoff


Store the data in the normalized format.

If you are not getting acceptable performance from this scheme, instead of denormalizing, first consider what indexes you have on the table. You're likely missing an index that would make this perform similar to the denormalized table. Next, try writing a query to retrieve data from the normalized table so that the result set looks like the denormalized table, and use that query to create an indexed view. This will give you select performance identical to the denormalized table, but retain the nice data organization benefits of the proper normalization.

like image 125
Joel Coehoorn Avatar answered Sep 21 '22 00:09

Joel Coehoorn


Denormalization optimizes exactly one means of accessing the data, at the expense of (almost all) others.

If you have only one access method that is performance critical, denormalization is likely to help; though proper index selection is of greater benefit. However, if you have multiple performance critical access paths to the data, you are better to seek other optimizations.

Creation of an appropriate clustered index; putting your non-clustered indices on SSD's. increasing memory on your server; are all techniques that will improve performance for all* accesses, rather than trading off between various accesses.

like image 27
Pieter Geerkens Avatar answered Sep 22 '22 00:09

Pieter Geerkens