Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should totals be denormalized?

I am working on a website with a simple normalized database.

There is a table called Pages and a table called Views. Each time a Page is viewed, a unique record of that View is recorded in the Views table.

When displaying a Page on the site, I use a simple MySQL COUNT() to total up the number of Views for display.

Database design seems fine, except for this problem: I am at a loss for how to retrieve the top 10 most viewed pages among thousands.

Should I denormalize the Pages table by adding a Pages.views column to hold the total number of views for each page? Or is there an efficient way to query for the top 10 most viewed pages?

like image 460
ryonlife Avatar asked Mar 04 '09 04:03

ryonlife


People also ask

What is better normalized or denormalized?

Normalization is used when the faster insertion, deletion and update anomalies, and data consistency are necessarily required. On the other hand, Denormalization is used when the faster search is more important and to optimize the read performance.

Why would you Denormalize a database?

Denormalization is the process of adding precomputed redundant data to an otherwise normalized relational database to improve read performance of the database.

What is the drawback of denormalization?

Cons of Denormalization:Updates and inserts are more expensive. Denormalization can make update and insert code harder to write. Data may be inconsistent. Data redundancy necessitates more storage.

Should a data warehouse be normalized?

Normalization is critical for several reasons, but primarily because it enables data warehouses to occupy as minimal disk space as possible. This results in improved performance.

Why is normalized data better than Unnormalized?

Normalization increases the number of tables and joins. In contrast, denormalization reduces the number of tables and joins. Disk space is wasted in denormalization because the same data is stored in different places. On the contrary, disk space is optimized in a normalized table.


2 Answers

   SELECT p.pageid, count(*) as viewcount FROM 
   pages p
   inner join views v on p.pageid = v.pageid
   group by p.pageid
   order by count(*) desc   
   LIMIT 10 OFFSET 0;

I can't test this, but something along those lines. I would not store the value unless I have to due to performance constraints (I just learned the term "premature optimization", and it seems to apply if you do).

like image 154
cdonner Avatar answered Nov 03 '22 00:11

cdonner


It depends on the level of information you are trying to maintain. If you want to record who viewed when? Then the separate table is fine. Otherwise, a column for Views is the way to go. Also If you keep a separate column, you'll find that the table will be locked more often since each page view will try to update the column for its corresponding row.

Select pageid, Count(*) as countCol from Views
group by pageid order by countCol DESC
LIMIT 10 OFFSET 0;
like image 36
fasih.rana Avatar answered Nov 02 '22 22:11

fasih.rana