I have next table with nearly 2 millions records and of course increasing everyday. Some table records (below columnId are foreign keys to parent tables e.g. DirectionId --> Direction table, ...):
Id TypeId DirectionId UserId IndicatorId Date Size ExternalId
2003 100 1 1 1 2015-06-01 00:02:23.0000000 11931 28657340
2004 2 1 2 1 2015-06-01 00:03:21.0000000 10358 28657341
2005 2 2 2 1 2015-06-01 00:03:31.0000000 10848 28657342
2006 100 1 2 1 2015-06-01 00:03:52.0000000 7860 28657343
2007 100 1 3 1 2015-06-01 00:03:59.0000000 13353 28657344
I need to get datetime of last message TypeId and DirectionId. Query below returns what I need
select TypeId, DirectionID, max(date) as Date
from message
group by TypeId, DirectionID;
DirectionId TypeId Date
2 1 2015-06-05 15:12:37.0000000
1 1 2015-06-05 15:12:39.0000000
The problem is that this query takes from 2500 ms to 3000 ms to execute. I added index:
CREATE NONCLUSTERED INDEX [date_index] ON [mqview].[Message] ([Date] ASC)
INCLUDE ([Id], [TypeId],[DirectionId], [UserId], [Size], [ExternalId]) WITH (PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
What can be done to get results faster?
Update
With proposed added index, I get results faster, but now I would like to get faster results with two inner joins as described above. Or eventually I could do additional 2 queries from tables MessageDirection and MessageType if nothing can be done to increase performance of below query.
SET STATISTICS TIME ON
select mt.Code, md.Code, max(m.date) as Date
from
mqview.Message m
inner join mqview.MessageDirection md on (md.Id = m.DirectionId)
inner join mqview.MessageType mt on (mt.Id = m.TypeId)
group by mt.Code, md.Code
SET STATISTICS TIME OFF
Message:
SQL Server Execution Times:
CPU time = 3343 ms, elapsed time = 2817 ms.
Execution plan:
Conclusion. GROUP BY is a powerful statement, but it tends to slow down queries. Over time, my team and I have used it many times and defined SQL indexes to avoid the performance issues introduced by the GROUP BY clause, especially when dealing with large tables.
The way to make a query run faster is to reduce the number of calculations that the software (and therefore hardware) must perform. To do this, you'll need some understanding of how SQL actually makes calculations.
Most of your queries are logically invalid, because you are selecting non aggregate columns not mentioned in the GROUP BY clause. That being said, aggregation takes time, and slowness is to be expected, assuming your tables are fairly large.
The most general way to satisfy a GROUP BY clause is to scan the whole table and create a new temporary table where all rows from each group are consecutive, and then use this temporary table to discover groups and apply aggregate functions (if any).
Your index isn't helpful for the query. You are grouping by TypeId
first, whereas your index has the rows ordered by Id
first. So to group by TypeId
and then DirectionId
, the query still has to scan every single row in the table. Then, once it has grouped by these values, it has to look at every row in each group to find the maximum date.
If you had the rows indexed by TypeId
, and then by DirectionId
, then the grouping would be quicker because the rows would naturally be in order of their grouping within the index. If you then add Date
to the index, then the query will know that the last row in each group will be the highest date, which will speed it up a little, but if you make the Date
sort in the index descending, then the first row in each group will have the highest date. This means that only the first row in each group needs to be looked at. This will give a great speed boost - you will probably find that with this index your query becomes near-instant.
Because the index now contains all the values in your query, the actual rows of the table will not even need to be visited. The database engine can return the values straight from the index. This removes another step from the query processing and makes it faster again.
Your CREATE INDEX
statement would look like this:
CREATE INDEX ix_myNewIndex ON [mqview].[Message] (TypeId, DirectionId, [Date] DESC)
IF OBJECT_ID('tempdb.dbo.#temp') IS NOT NULL
DROP TABLE #temp
GO
CREATE TABLE #temp
(
Id INT PRIMARY KEY,
TypeId TINYINT,
DirectionId TINYINT,
UserId TINYINT,
IndicatorId TINYINT,
[Date] DATETIME2
)
CREATE /*UNIQUE*/ NONCLUSTERED INDEX ix ON #temp (TypeId, DirectionId, [Date] DESC) -- DESC
GO
INSERT INTO #temp (Id, TypeId, DirectionId, UserId, IndicatorId, [Date])
VALUES
(2003, 100, 1, 1, 1, '20150601 00:02:23.0000000'),
(2004, 2 , 1, 2, 1, '20150601 00:03:21.0000000'),
(2005, 2 , 2, 2, 1, '20150601 00:03:31.0000000'),
(2006, 100, 1, 2, 1, '20150601 00:03:52.0000000'),
(2007, 100, 1, 3, 1, '20150601 00:03:59.0000000')
SELECT TypeId, DirectionID, MAX([Date])
FROM #temp
GROUP BY TypeId, DirectionId
update:
SELECT mt.Code, md.Code, t.[Date]
FROM (
SELECT TypeId, DirectionID, [Date] = MAX([Date])
FROM mqview.[Message]
GROUP BY TypeId, DirectionId
) t
JOIN mqview.MessageDirection md on md.Id = t.DirectionId
JOIN mqview.MessageType mt on mt.Id = t.TypeId
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With