Use a high level of redundant, denormalized data in my DB designs to improve performance. I'll often store data that would normally need to be joined or calculated. For example, if I have a User table and a Task table, I would store the Username and UserDisplayName redundantly in every Task record. Another example of this is storing aggregates, such as storing the TaskCount in the User table. <ul> <li> User <ul> <li>UserID</li> <li>Username</li> <li>UserDisplayName</li> <li>TaskCount</li> </ul> </li> <li> Task <ul> <li>TaskID</li> <li>TaskName</li> <li>UserID</li> <li>UserName</li> <li>UserDisplayName</li> </ul> </li> </ul> This is great for performance since the app has many more reads than insert, update or delete operations, and since some values like Username change rarely. However, the big draw back is that the integrity has to be enforced via application code or triggers. This can be very cumbersome with updates. My question is can this be done automatically in SQL Server 2005/2010... maybe via a persisted/permanent View. Would anyone recommend another possibly solution or technology. I've heard document-based DBs such as CouchDB and MongoDB can handle denormalized data more effectively.

You might want to first try an Indexed View before moving to a NoSQL solution: http://msdn.microsoft.com/en-us/library/ms187864.aspx and: http://msdn.microsoft.com/en-us/library/ms191432.aspx Using an Indexed View would allow you to keep your base data in properly normalized tables and maintain data-integrity while giving you the denormalized "view" of that data. I would not recommend this for highly transactional tables, but you said it was heavier on reads than writes so you might want to see if this works for you. Based on your two example tables, one option is: 1) Add a column to the User table defined as: <pre class="prettyprint"><code>TaskCount INT NOT NULL DEFAULT (0) </code></pre> 2) Add a Trigger on the Task table defined as: <pre class="prettyprint"><code>CREATE TRIGGER UpdateUserTaskCount ON dbo.Task AFTER INSERT, DELETE AS ;WITH added AS ( SELECT ins.UserID, COUNT(*) AS [NumTasks] FROM INSERTED ins GROUP BY ins.UserID ) UPDATE usr SET usr.TaskCount = (usr.TaskCount + added.NumTasks) FROM dbo.[User] usr INNER JOIN added ON added.UserID = usr.UserID ;WITH removed AS ( SELECT del.UserID, COUNT(*) AS [NumTasks] FROM DELETED del GROUP BY del.UserID ) UPDATE usr SET usr.TaskCount = (usr.TaskCount - removed.NumTasks) FROM dbo.[User] usr INNER JOIN removed ON removed.UserID = usr.UserID GO </code></pre> 3) Then do a View that has: <pre class="prettyprint"><code>SELECT u.UserID, u.Username, u.UserDisplayName, u.TaskCount, t.TaskID, t.TaskName FROM User u INNER JOIN Task t ON t.UserID = u.UserID </code></pre> And then follow the recommendations from the links above (WITH SCHEMABINDING, Unique Clustered Index, etc.) to make it "persisted". While it is inefficient to do an aggregation in a subquery in the SELECT as shown above, this specific case is intended to be denormalized in a situation that has higher reads than writes. So doing the Indexed View will keep the entire structure, including the aggregation, physically stored so each read will not recalculate it. Now, if a LEFT JOIN is needed if some Users do not have any Tasks, then the Indexed View will not work due to the 5000 restrictions on creating them. In that case, you can create a real table (UserTask) that is your denormalized structure and have it populated via either a Trigger on just the User Table (assuming you do the Trigger I show above which updates the User Table based on changes in the Task table) or you can skip the TaskCount field in the User Table and just have Triggers on both tables to populate the UserTask table. In the end, this is basically what an Indexed View does just without you having to write the synchronization Trigger(s).

Updating redundant/denormalized data automatically in SQL Server

Tags:

sql-server

sql-server-2005

denormalization

Use a high level of redundant, denormalized data in my DB designs to improve performance. I'll often store data that would normally need to be joined or calculated. For example, if I have a User table and a Task table, I would store the Username and UserDisplayName redundantly in every Task record. Another example of this is storing aggregates, such as storing the TaskCount in the User table.

User
- UserID
- Username
- UserDisplayName
- TaskCount
Task
- TaskID
- TaskName
- UserID
- UserName
- UserDisplayName

This is great for performance since the app has many more reads than insert, update or delete operations, and since some values like Username change rarely. However, the big draw back is that the integrity has to be enforced via application code or triggers. This can be very cumbersome with updates.

My question is can this be done automatically in SQL Server 2005/2010... maybe via a persisted/permanent View. Would anyone recommend another possibly solution or technology. I've heard document-based DBs such as CouchDB and MongoDB can handle denormalized data more effectively.

760

asked Jan 25 '11 01:01

Sterling Nichols

1 Answers

You might want to first try an Indexed View before moving to a NoSQL solution:

http://msdn.microsoft.com/en-us/library/ms187864.aspx

and:

http://msdn.microsoft.com/en-us/library/ms191432.aspx

Using an Indexed View would allow you to keep your base data in properly normalized tables and maintain data-integrity while giving you the denormalized "view" of that data. I would not recommend this for highly transactional tables, but you said it was heavier on reads than writes so you might want to see if this works for you.

Based on your two example tables, one option is:

1) Add a column to the User table defined as:

TaskCount INT NOT NULL DEFAULT (0)

2) Add a Trigger on the Task table defined as:

CREATE TRIGGER UpdateUserTaskCount
ON dbo.Task
AFTER INSERT, DELETE
AS

;WITH added AS
(
    SELECT  ins.UserID, COUNT(*) AS [NumTasks]
    FROM    INSERTED ins
    GROUP BY    ins.UserID
)
UPDATE  usr
SET     usr.TaskCount = (usr.TaskCount + added.NumTasks)
FROM    dbo.[User] usr
INNER JOIN  added
        ON  added.UserID = usr.UserID


;WITH removed AS
(
    SELECT  del.UserID, COUNT(*) AS [NumTasks]
    FROM    DELETED del
    GROUP BY    del.UserID
)
UPDATE  usr
SET     usr.TaskCount = (usr.TaskCount - removed.NumTasks)
FROM    dbo.[User] usr
INNER JOIN  removed
        ON  removed.UserID = usr.UserID
GO

3) Then do a View that has:

SELECT   u.UserID,
         u.Username,
         u.UserDisplayName,
         u.TaskCount,
         t.TaskID,
         t.TaskName
FROM     User u
INNER JOIN   Task t
        ON   t.UserID = u.UserID

And then follow the recommendations from the links above (WITH SCHEMABINDING, Unique Clustered Index, etc.) to make it "persisted". While it is inefficient to do an aggregation in a subquery in the SELECT as shown above, this specific case is intended to be denormalized in a situation that has higher reads than writes. So doing the Indexed View will keep the entire structure, including the aggregation, physically stored so each read will not recalculate it.

Now, if a LEFT JOIN is needed if some Users do not have any Tasks, then the Indexed View will not work due to the 5000 restrictions on creating them. In that case, you can create a real table (UserTask) that is your denormalized structure and have it populated via either a Trigger on just the User Table (assuming you do the Trigger I show above which updates the User Table based on changes in the Task table) or you can skip the TaskCount field in the User Table and just have Triggers on both tables to populate the UserTask table. In the end, this is basically what an Indexed View does just without you having to write the synchronization Trigger(s).

answered Oct 14 '22 21:10

Solomon Rutzky

Related questions
                            
                                Sql server's Constant scan - clarification?
                            
                                Entity Framework 6 (code first) entity versioning and auditing
                            
                                Use Option (Recompile) in an Inline Table Valued Function
                            
                                SQL Server on Linux > Bulk Import error
                            
                                Generating random values from uniform distribution with setting a seed in T-SQL
                            
                                ASP.NET Core - Application not connecting to database after publishing
                            
                                How to modify query in EF Core 2.0, before it goes to the SQL Server?
                            
                                SQL Server connection errors when moving .net app to new server
                            
                                SQL Server sys.databases log_reuse_wait question
                            
                                How do you get output parameters from a stored procedure in Python?
                            
                                How to change the base type of a UDT in Sql Server 2005?
                            
                                Can you run an SSIS task from .net?
                            
                                Suspended status in SQL Activity Monitor
                            
                                Stored procedure output parameters in SQL Server Profiler
                            
                                What is the difference between a unique constraint and a unique index
                            
                                How do we alias a Sql Server instance name used in a Connection String .config?
                            
                                Any hidden pitfalls changing a column from varchar(8000) to varchar(max)?
                            
                                Query to get XML output for hierarchical data using FOR XML PATH in SQL Server
                            
                                MSSQL - Define a column name in SELECT statement then use that in WHERE clause
                            
                                CLR Table-valued function with array argument

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With