Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slowness at Indexed View for SQL 2005

Say I have a very long table (~35 million rows) called TimeCard with only 5 columns (tableID, CompanyID, UserID, ProjectID, DailyHoursWorked, entryDate). This is a pretty straight forward table that records employees' worked hours per day per project per company.

I now need to generate a report to find out the employees' total worked hours per month per project for any given company. Instead of performing the aggregation needed when the report runs, I want to build a table-like data structure that already have all the Company/Project/User data aggregated by month, so when the report runs, I can just query that data structure directly without performing any run-time aggregation since ~35million records can take a few mins.

So I have 2 different ways. One create an extra physical table with (CompanyID, UserID, ProjectID, MonthlyHoursWorked, Month) as my columns and just use trigger at the TimeCard table to modify the values at the extra table. Or I can create an Indexed View. So I tried both. I first tried the indexed view with the following code:

CREATE VIEW [dbo].[vw_myView] WITH SCHEMABINDING AS
SELECT 
 JobID,
 ProjectID,
 Sum(DailyHoursWorked) AS MonthTotal,
 DATEADD( Month, DATEDIFF( Month, 0, entryDate), 0 ) AS entryMonth,
 CompanyID,
 COUNT_BIG(*) AS Counter
FROM
 dbo.TimeCard 
Group By DATEADD( Month, DATEDIFF( Month, 0, entryDate ), 0 ), JobID, ProjectID, CompanyID

Go
CREATE UNIQUE CLUSTERED INDEX [IX_someIndex] ON [dbo].[vw_myView] 
(
 [CompanyID] ASC,
 [entryMonth] ASC,
 [UserID] ASC,
 [ProjectID] ASC
)

The indexed view created correctly and totaling with ~5 million rows total.

However, every time if I clear the SQL cache, and run the following query: *select * from vw_myView where companyID = 1*, it takes almost 3 minutes. If I go with the extra table route as I mentioned above, with my cache cleared, it takes around 4 seconds.

My questions are, is Indexed View a bad choice for this particular scenario? In particular I am interested to know if the entire indexed view gets re-calculated/re-aggregated every time when the underlying table (TimeCard) is changed or when a query is run against it?

Thanks!

like image 471
TheYouth Avatar asked Mar 10 '10 19:03

TheYouth


2 Answers

If you are not using either the Enterprise or Developer edition, then you need to use the with (noexpand) hint:

select * 
from vw_myView with (noexpand)
where companyID = 1

When the underlying data changes, the view will only update rows related to the changed data, not the entire table. This can have an adverse impact on an OLTP database with a high degree of inserts, but if usage is only moderate, should not pose a performance problem.

A tip from Microsoft:

As a general recommendation, any modifications or updates to the view or the base tables underlying the view should be performed in batches if possible, rather than singleton operations. This may reduce some overhead in the view maintenance.

like image 125
D'Arcy Rittich Avatar answered Sep 28 '22 00:09

D'Arcy Rittich


I think you are on the right path with using an index View. However, have you put indexes on table you are querying from, TimeCard for your aggregate columns. You need to make an Index of JobID, ProjectID, entryDate, CompanyID (1 index). If you use 1 index for each column it will NOT solve your problems because the Query will have to use all 4 indexes together.

I do think using the trigger will be slow but in a different way. It will make your query faster but it will slow down every insert you do into TimeCard. If you do decide to go with the Trigger then I would make sure I index that table as well or might also be slow, not 3 minutes slow, but still slow to sort and return data.

like image 28
Ben Hoffman Avatar answered Sep 28 '22 01:09

Ben Hoffman