Denormalized access: views or materialized tables?

Tags:

I'm looking to create denormalized access to data essentially for reporting purposes (and thus to avoid joins and gain performance). I have two solutions in mind, but I'm looking for (a) other potential solutions, and (b) what tradeoffs I should be considering. I'm using SQL Server 2008 R2.

In one solution, I could create an indexed view over the query which does that joins that I care about. My understanding is that this does materialize under the hood but is tricky and might not guarantee good performance (and there's a vociferous debate about the performance of views).

In another solution, I could build the machinery to create a table, populate it with the data I care about, and in a transaction swap it out for the existing table.

The former seems risky/magical to me; the latter seems janky, error prone, and likely to impact things like query plans. Can someone help shed light on this please?

547

asked Aug 08 '14 20:08

Richard Pianka

2 Answers

Views are inlined into the query plan at a very early stage in the optimization pipeline. Neither do they hurt nor do they improve performance.

Indexed views are also inlined. It doesn't matter whether you have written your query to reference a view or whether you have pasted the view definition. On Enterprise Edition, the optimizer tries to match parts of the query against indexed views later in the pipeline. This is indeed unreliable. For simple cases it works fine but I have seen it fail randomly. There is no guarantee.

There is a solution for that: WITH (NOINDEX) forces the optimizer to not inline the view but to use its index. This is 100% reliable.

If you can fit your query pattern into an indexed view and are able to use that hint (you can create a wrapper view that always contains that hint) then indexed views are a nice automatic and consistent solution.

The former seems risky/magical to me

With that hint, there is little surprise left in indexed views. If it helps: I have production experience with indexed views. They work fine.

the latter seems janky, error prone, and likely to impact things like query plans. Can someone help shed light on this please?

That is true. Every bug has the potential to cause data corruption. You better not forget any place where data is written, or else your denormalized data becomes inconsistent.

I recommend you use indexed views for these reasons if there is no good reason against using them.

200

answered Sep 18 '22 13:09

usr

Firstly, indexed views have two gotchas:

1) An updated row in the base table must be able to propagate through to the indexed view without having to reference any other rows in the base table. This is why you can use SUM() and COUNT_BIG(), since those aggregates can be updated just by knowing what is in the changed rows, but you can't use MIN() or MAX(), since you've have to look back through the base table to make sure.

All the seemingly arbitrary restrictions on indexed views (no TVFs, no subqueries, no unions, etc) boil down the the above reason. The engine has to be able to maintain the index without constantly looking at the whole base table.

2) AFTER triggers on the base table still get processed before the indexed views are updated.

These limit the complexity of what you can accomplish with indexed views alone.

Secondly, test that denormalization will actually help. If you are just instantiating simple joins and you are already I/O bound, that will make the bottleneck worse. On the other hand, if you are precomputing large aggregates or taking vertical slices of extremely wide joins, it's more likely to be an improvement.

Thirdly, to use indexed views, use a pattern like this:

CREATE TABLE huge_data_table ( ... )
GO

CREATE VIEW huge_data_table_monthly_summary_index AS
SELECT
  YEAR(...) AS [year_...]
 ,MONTH(...) AS [month_...]
 ,SUM(...) AS [sum_...]
 ,COUNT_BIG(*) AS [count_...]
FROM huge_data_table
GROUP BY YEAR(...),MONTH(...)
GO

CREATE UNIQUE CLUSTERED INDEX UC__xyz
ON huge_data_table_monthly_summary_index
  ([year_...],[month_...])
GO

CREATE VIEW monthly_summary AS
SELECT
  [year_...]
 ,[month_...]
 ,[sum_...]
 ,[count_...]
FROM huge_data_table_monthly_summary_index WITH (NOEXPAND) --force use of the view's index
GO

Then you can query monthly_summary and get the aggregates without having to recompute them every time. The engine updates huge_data_table_monthly_summary_index automatically whenever huge_data_table changes.

It may seem like magic, but it works. Use it as far as you can in lieu of triggers/jobs/etc to synchronize tables. I have found that I can usually break down a complex reporting job into smaller, simpler pieces that can use indexed views, and joining the intermediate results on the fly at reporting time turns out to be fast enough to get the job done.

If that's not enough for your needs, you are probably in the realm of log shipping or mirroring to a separate reporting server. There's not much middle ground where syncing manually makes sense.

answered Sep 18 '22 13:09

Anon

Related questions
                            
                                SQL DateDiff without enddate
                            
                                Difference between Tables and Temp Tables
                            
                                How do I add a Windows user to the sysadmin server role in SQL Server 2012? [closed]
                            
                                Use SQL CTE table to include path and all children
                            
                                Create a query to fill the gaps in a table due to bad data
                            
                                row_number reset
                            
                                How to create a Database Diagram in SQL Server involving 2 databases? [closed]
                            
                                How to insert PHP array values using an stored procedure SQL SERVER 2008 R2(mssql)
                            
                                Simultaneous calls
                            
                                Alter Procedure statement in Visual Studio gives 'This statement is not recognized in this context message'
                            
                                Strategy to optimize this large SQL insert via C#?
                            
                                SQL Server full-text search using thesaurus and prefix form together
                            
                                SQL Most effective way to store every word in a document separately
                            
                                Get number of times keyword is present
                            
                                In hibernate how to programmatically set the isolation level of a transaction, or how to create two transactions with different isolation levels
                            
                                SQL Server install bug : Failed to launch local ScenarioEngine.exe
                            
                                Solution for preventing DB deadlock in ColdFusion?
                            
                                MS SQL Invalid Object Name
                            
                                Can't access a network share when executing a package from the SSIS Catalog
                            
                                Difference between processor affinity and I/O affinity

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Denormalized access: views or materialized tables?

Tags:

sql-server

database-performance

Richard Pianka

People also ask

2 Answers

usr

Anon

Recent Activity

Donate For Us