Here's a basic query that relies on two non-clustered indexes:
SELECT cc.categoryid, count(*) from company c
INNER JOIN companycategory cc on cc.companyid = c.id
WHERE c.placeid like 'ca_%'
GROUP BY cc.categoryid order by count(*) desc
When the exact same database is hosted on SQL Server 2008, on virtually any hardware, this returns < 500 ms. Even with the cache buffers cleared:
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
... this still returns in ~1 second on traditional SQL.
On Azure, it takes approximately 3.5 seconds to return each time.
Some articles out there seem to suggest that people are generally happy with query performance in SQL Azure. And yet here's a basic scenario where 'obvious' tuning has been exhausted and there's no network latency issues to speak of. It's just really slow when working w/ large tables (companycategroy has 1.2M records, places has 7.5K).
The total database size is no more than 4GB. Selecting 'Web' edition vs. 'Enterprise' edition doesn't seem to make much of a difference either.
What am I missing?
This is only a basic example, it only gets worse with more sophisticated queries, all of have been reviewed, tuned, and perform well on-premise.
Here's the execution plan:
|--Sort(ORDER BY:([Expr1004] DESC))
|--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))
|--Hash Match(Aggregate, HASH:([cc].[CategoryId]), RESIDUAL:([XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId] = [XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId]) DEFINE:([Expr1007]=COUNT(*)))
|--Hash Match(Inner Join, HASH:([c].[Id])=([cc].[CompanyId]))
|--Index Scan(OBJECT:([XX].[dbo].[Company].[IX_Company_PlaceId] AS [c]), WHERE:([XX].[dbo].[Company].[PlaceId] as [c].[PlaceId] like N'ca_%'))
|--Index Scan(OBJECT:([XX].[dbo].[CompanyCategory].[IX_CompanyCategory_CompanyId] AS [cc]))
And here are the stats:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 14 ms, elapsed time = 14 ms.
(789 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CompanyCategory'. Scan count 1, logical reads 5183, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Company'. Scan count 1, logical reads 8710, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 3328 ms, elapsed time = 3299 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Index definitions are as follows:
CREATE NONCLUSTERED INDEX [IX_Company_PlaceId] ON [dbo].[Company]
(
[PlaceId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
CREATE NONCLUSTERED INDEX [IX_CompanyCategory_CompanyId] ON [dbo].[CompanyCategory]
(
[CompanyId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
ALTER TABLE [dbo].[Company] ADD CONSTRAINT [PK_Company_Id] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
They seem to use one CPU
core for your query while on your machine the query probably parallelizes (all operations used by the query do parallelize).
However, an index scan is used for the LIKE
predicate for some reason while an index seek could suffice.
Please try using this explicit condition instead of LIKE
:
c.placeid >= 'ca'
AND c.placeid < 'cb'
and see if it changes the plan to an Index Seek
on IX_CompanyPlaceId
.
Just a few things:
c.placeid
for some statistics on itc.placeid
a string? Does this follow through to companyid
and c.id
? I think this is why you have the Hash Match - try joining on integer surrogate keys instead.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With