Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Azure Query Performance - Terribly Slow Even With Tuned Queries

Here's a basic query that relies on two non-clustered indexes:

SELECT cc.categoryid, count(*) from company c
INNER JOIN companycategory cc on cc.companyid = c.id
WHERE c.placeid like 'ca_%'
GROUP BY cc.categoryid order by count(*) desc

When the exact same database is hosted on SQL Server 2008, on virtually any hardware, this returns < 500 ms. Even with the cache buffers cleared:

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

... this still returns in ~1 second on traditional SQL.

On Azure, it takes approximately 3.5 seconds to return each time.

Some articles out there seem to suggest that people are generally happy with query performance in SQL Azure. And yet here's a basic scenario where 'obvious' tuning has been exhausted and there's no network latency issues to speak of. It's just really slow when working w/ large tables (companycategroy has 1.2M records, places has 7.5K).

The total database size is no more than 4GB. Selecting 'Web' edition vs. 'Enterprise' edition doesn't seem to make much of a difference either.

What am I missing?

This is only a basic example, it only gets worse with more sophisticated queries, all of have been reviewed, tuned, and perform well on-premise.

Here's the execution plan:

  |--Sort(ORDER BY:([Expr1004] DESC))
       |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))
            |--Hash Match(Aggregate, HASH:([cc].[CategoryId]), RESIDUAL:([XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId] = [XX].[dbo].[CompanyCategory].[CategoryId] as [cc].[CategoryId]) DEFINE:([Expr1007]=COUNT(*)))
                 |--Hash Match(Inner Join, HASH:([c].[Id])=([cc].[CompanyId]))
                      |--Index Scan(OBJECT:([XX].[dbo].[Company].[IX_Company_PlaceId] AS [c]),  WHERE:([XX].[dbo].[Company].[PlaceId] as [c].[PlaceId] like N'ca_%'))
                      |--Index Scan(OBJECT:([XX].[dbo].[CompanyCategory].[IX_CompanyCategory_CompanyId] AS [cc]))

And here are the stats:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 14 ms, elapsed time = 14 ms.

(789 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CompanyCategory'. Scan count 1, logical reads 5183, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Company'. Scan count 1, logical reads 8710, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 3328 ms,  elapsed time = 3299 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Index definitions are as follows:

CREATE NONCLUSTERED INDEX [IX_Company_PlaceId] ON [dbo].[Company] 
(
    [PlaceId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

CREATE NONCLUSTERED INDEX [IX_CompanyCategory_CompanyId] ON [dbo].[CompanyCategory] 
(
    [CompanyId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO

ALTER TABLE [dbo].[Company] ADD  CONSTRAINT [PK_Company_Id] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
GO
like image 229
Nariman Avatar asked Jul 27 '11 21:07

Nariman


2 Answers

They seem to use one CPU core for your query while on your machine the query probably parallelizes (all operations used by the query do parallelize).

However, an index scan is used for the LIKE predicate for some reason while an index seek could suffice.

Please try using this explicit condition instead of LIKE:

c.placeid >= 'ca'
AND c.placeid < 'cb'

and see if it changes the plan to an Index Seek on IX_CompanyPlaceId.

like image 169
Quassnoi Avatar answered Oct 21 '22 07:10

Quassnoi


Just a few things:

  • Are stats up to date on Azure? I'm a bit wary of that Hash Match for a 1.2M row table
  • Does Azure have auto stats? If not, your local database might have a lot more information that SQL Azure can't use to pick an optimal query plan
  • Index c.placeid for some statistics on it
  • Why is c.placeid a string? Does this follow through to companyid and c.id? I think this is why you have the Hash Match - try joining on integer surrogate keys instead.
like image 39
ta.speot.is Avatar answered Oct 21 '22 07:10

ta.speot.is