Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min

It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?

SELECT 
    [auth_user].[id], 
    [auth_user].[password], 
    [auth_user].[last_login], 
    [auth_user].[is_superuser], 
    [auth_user].[username], 
    [auth_user].[first_name], 
    [auth_user].[last_name], 
    [auth_user].[email], 
    [auth_user].[is_staff], 
    [auth_user].[is_active], 
    [auth_user].[date_joined], 
    COUNT([tickets_ticket].[id]) AS [tickets_captured__count], 
    COUNT(T3.[id]) AS [assigned_tickets__count], 
    COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count] 
FROM 
    [auth_user] 
    LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id]) 
    LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id]) 
    LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id]) 
GROUP BY 
    [auth_user].[id], 
    [auth_user].[password], 
    [auth_user].[last_login], 
    [auth_user].[is_superuser], 
    [auth_user].[username], 
    [auth_user].[first_name], 
    [auth_user].[last_name], 
    [auth_user].[email], 
    [auth_user].[is_staff], 
    [auth_user].[is_active], 
    [auth_user].[date_joined] 
HAVING 
    (COUNT([tickets_ticket].[id]) > 0  OR COUNT(T3.[id]) > 0 )

EDIT:

Here are the relevant indexes (excluding those not used in the query):

auth_user.id                       (PK)
auth_user.username                 (Unique)
tickets_ticket.id                  (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id         (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id

EDIT 2:

After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:

SELECT 
    COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
    COUNT(T3.[id]) AS [assigned_tickets__count],
    COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM 
    [auth_user] 
    LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id]) 
    LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id]) 
    LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id]) 
GROUP BY 
    [auth_user].[id]

The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).

EDIT 3:

The python code which generated this is:

User.objects.annotate(
    Count('tickets_captured'), 
    Count('assigned_tickets'), 
    Count('tickets_watched')
)

A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?

like image 759
aquavitae Avatar asked Dec 07 '25 06:12

aquavitae


1 Answers

SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers

As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.

like image 88
John Tseng Avatar answered Dec 08 '25 21:12

John Tseng