Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make group by query faster?

I have next table with nearly 2 millions records and of course increasing everyday. Some table records (below columnId are foreign keys to parent tables e.g. DirectionId --> Direction table, ...):

Id         TypeId   DirectionId UserId  IndicatorId Date                                     Size      ExternalId
2003    100        1              1          1              2015-06-01 00:02:23.0000000 11931   28657340
2004    2           1               2          1             2015-06-01 00:03:21.0000000 10358   28657341
2005    2           2               2          1             2015-06-01 00:03:31.0000000 10848   28657342
2006    100        1              2          1             2015-06-01 00:03:52.0000000  7860    28657343
2007    100        1              3          1             2015-06-01 00:03:59.0000000  13353   28657344

I need to get datetime of last message TypeId and DirectionId. Query below returns what I need

select TypeId, DirectionID, max(date) as Date
from message
group by TypeId, DirectionID;

DirectionId TypeId  Date
2               1         2015-06-05 15:12:37.0000000
1               1         2015-06-05 15:12:39.0000000

The problem is that this query takes from 2500 ms to 3000 ms to execute. I added index:

CREATE NONCLUSTERED INDEX [date_index] ON [mqview].[Message] ([Date] ASC)
INCLUDE ([Id],  [TypeId],[DirectionId], [UserId], [Size], [ExternalId]) WITH (PAD_INDEX = OFF, 
STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, 
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

What can be done to get results faster?

Update

With proposed added index, I get results faster, but now I would like to get faster results with two inner joins as described above. Or eventually I could do additional 2 queries from tables MessageDirection and MessageType if nothing can be done to increase performance of below query.

SET STATISTICS TIME ON
select  mt.Code, md.Code, max(m.date) as Date
from
  mqview.Message m
  inner join mqview.MessageDirection md on (md.Id = m.DirectionId)
  inner join mqview.MessageType mt on (mt.Id = m.TypeId)
group by mt.Code, md.Code
SET STATISTICS TIME OFF

Message:

 SQL Server Execution Times:
 CPU time = 3343 ms,  elapsed time = 2817 ms.

Execution plan: enter image description here

like image 471
broadband Avatar asked Dec 03 '15 10:12

broadband


People also ask

Does group by improve query performance?

Conclusion. GROUP BY is a powerful statement, but it tends to slow down queries. Over time, my team and I have used it many times and defined SQL indexes to avoid the performance issues introduced by the GROUP BY clause, especially when dealing with large tables.

How do I make my query run faster?

The way to make a query run faster is to reduce the number of calculations that the software (and therefore hardware) must perform. To do this, you'll need some understanding of how SQL actually makes calculations.

Does group by Make query slower?

Most of your queries are logically invalid, because you are selecting non aggregate columns not mentioned in the GROUP BY clause. That being said, aggregation takes time, and slowness is to be expected, assuming your tables are fairly large.

How do you optimize a group by?

The most general way to satisfy a GROUP BY clause is to scan the whole table and create a new temporary table where all rows from each group are consecutive, and then use this temporary table to discover groups and apply aggregate functions (if any).


2 Answers

Your index isn't helpful for the query. You are grouping by TypeId first, whereas your index has the rows ordered by Id first. So to group by TypeId and then DirectionId, the query still has to scan every single row in the table. Then, once it has grouped by these values, it has to look at every row in each group to find the maximum date.

If you had the rows indexed by TypeId, and then by DirectionId, then the grouping would be quicker because the rows would naturally be in order of their grouping within the index. If you then add Date to the index, then the query will know that the last row in each group will be the highest date, which will speed it up a little, but if you make the Date sort in the index descending, then the first row in each group will have the highest date. This means that only the first row in each group needs to be looked at. This will give a great speed boost - you will probably find that with this index your query becomes near-instant.

Because the index now contains all the values in your query, the actual rows of the table will not even need to be visited. The database engine can return the values straight from the index. This removes another step from the query processing and makes it faster again.

Your CREATE INDEX statement would look like this:

CREATE INDEX ix_myNewIndex ON [mqview].[Message] (TypeId, DirectionId, [Date] DESC) 
like image 88
Richard Irons Avatar answered Sep 22 '22 15:09

Richard Irons


IF OBJECT_ID('tempdb.dbo.#temp') IS NOT NULL
    DROP TABLE #temp
GO

CREATE TABLE #temp
(
    Id INT PRIMARY KEY,
    TypeId TINYINT,
    DirectionId TINYINT,
    UserId TINYINT,
    IndicatorId TINYINT,
    [Date] DATETIME2
)

CREATE /*UNIQUE*/ NONCLUSTERED INDEX ix ON #temp (TypeId, DirectionId, [Date] DESC) -- DESC
GO

INSERT INTO #temp (Id, TypeId, DirectionId, UserId, IndicatorId, [Date])
VALUES
    (2003, 100, 1, 1, 1, '20150601 00:02:23.0000000'),
    (2004, 2  , 1, 2, 1, '20150601 00:03:21.0000000'),
    (2005, 2  , 2, 2, 1, '20150601 00:03:31.0000000'),
    (2006, 100, 1, 2, 1, '20150601 00:03:52.0000000'),
    (2007, 100, 1, 3, 1, '20150601 00:03:59.0000000')


SELECT TypeId, DirectionID, MAX([Date])
FROM #temp
GROUP BY TypeId, DirectionId

update:

SELECT mt.Code, md.Code, t.[Date]
FROM (
    SELECT TypeId, DirectionID, [Date] = MAX([Date])
    FROM mqview.[Message]
    GROUP BY TypeId, DirectionId
) t
JOIN mqview.MessageDirection md on md.Id = t.DirectionId
JOIN mqview.MessageType mt on mt.Id = t.TypeId
like image 33
Devart Avatar answered Sep 21 '22 15:09

Devart