I have a table MYTABLE
that has approximately 25 columns, with two of them being USERID (integer)
and USERDATETIME (dateTime)
.
I have an index over this table on these two columns, with USERID
being the first column followed by USERDATETIME
.
I would like to get the maximum USERDATETIME for each USERID. So:
select USERID,MAX(USERDATETIME)
from MYTABLE WHERE USERDATETIME < '2015-10-11'
GROUP BY USERID
I would have expected the optimizer to be able to find each unique USERID
and maximum USERDATETIME
with the number of seeks equal to the number of unique USERID
s. And I would expect this to be reasonable fast. I have 2000 userids and 6 million rows in myTable. However, the actual plan shows 6 million rows from an index scan. If I use an index with USERDATETIME
/USERID
, the plan changes to use an index seek, but still 6 million rows.
Why does SQL not use the index in a way that would reduce the number of rows processed?
The MAX() function is used with the WHERE clause to gain further insights from our data. In SQL, the MAX() function computes the highest or maximum value of numeric values in a column.
MySQL MAX() function with GROUP BY retrieves maximum value of an expression which has undergone a grouping operation (usually based upon one column or a list of comma-separated columns).
The where clause defines the search condition of an SQL statement, and it thus falls into the core functional domain of an index: finding data quickly. Although the where clause has a huge impact on performance, it is often phrased carelessly so that the database has to scan a large part of the index.
Using MIN() and MAX() in the Same Query You can use both the MIN and MAX functions in one SELECT . If you use only these functions without any columns, you don't need a GROUP BY clause.
If you are using SQL Server this is not an optimisation generally carried out by the product (except in limited cases where the table is partitioned by that value).
However you can do it manually using the technique from here
CREATE TABLE YourTable
(
USERID INT,
USERDATETIME DATETIME,
OtherColumns CHAR(10)
)
CREATE CLUSTERED INDEX IX
ON YourTable(USERID ASC, USERDATETIME ASC);
WITH R
AS (SELECT TOP 1 USERID,
USERDATETIME
FROM YourTable
ORDER BY USERID DESC,
USERDATETIME DESC
UNION ALL
SELECT SubQuery.USERID,
SubQuery.USERDATETIME
FROM (SELECT T.USERID,
T.USERDATETIME,
rn = ROW_NUMBER()
OVER (
ORDER BY T.USERID DESC, T.USERDATETIME DESC)
FROM R
JOIN YourTable T
ON T.USERID < R.USERID) AS SubQuery
WHERE SubQuery.rn = 1)
SELECT *
FROM R
If you have another table with the UserIds it is possible to get an efficient plan more easily with
SELECT U.USERID,
CA.USERDATETIME
FROM Users U
CROSS APPLY (SELECT TOP 1 USERDATETIME
FROM YourTable Y
WHERE Y.USERID = U.USERID
ORDER BY USERDATETIME DESC) CA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With