Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL max() function with a where clause and group by does not use the index efficiently

I have a table MYTABLE that has approximately 25 columns, with two of them being USERID (integer) and USERDATETIME (dateTime).

I have an index over this table on these two columns, with USERID being the first column followed by USERDATETIME.

I would like to get the maximum USERDATETIME for each USERID. So:

select USERID,MAX(USERDATETIME) 
from MYTABLE WHERE USERDATETIME < '2015-10-11'
GROUP BY USERID

I would have expected the optimizer to be able to find each unique USERID and maximum USERDATETIME with the number of seeks equal to the number of unique USERIDs. And I would expect this to be reasonable fast. I have 2000 userids and 6 million rows in myTable. However, the actual plan shows 6 million rows from an index scan. If I use an index with USERDATETIME/USERID, the plan changes to use an index seek, but still 6 million rows.

Why does SQL not use the index in a way that would reduce the number of rows processed?

like image 470
Mike Avatar asked Dec 14 '15 18:12

Mike


People also ask

Can we use MAX function in WHERE clause in SQL?

The MAX() function is used with the WHERE clause to gain further insights from our data. In SQL, the MAX() function computes the highest or maximum value of numeric values in a column.

Can Max be used with GROUP BY?

MySQL MAX() function with GROUP BY retrieves maximum value of an expression which has undergone a grouping operation (usually based upon one column or a list of comma-separated columns).

Does WHERE clause use index?

The where clause defines the search condition of an SQL statement, and it thus falls into the core functional domain of an index: finding data quickly. Although the where clause has a huge impact on performance, it is often phrased carelessly so that the database has to scan a large part of the index.

Can we use Max in SQL without GROUP BY?

Using MIN() and MAX() in the Same Query You can use both the MIN and MAX functions in one SELECT . If you use only these functions without any columns, you don't need a GROUP BY clause.


1 Answers

If you are using SQL Server this is not an optimisation generally carried out by the product (except in limited cases where the table is partitioned by that value).

However you can do it manually using the technique from here

CREATE TABLE YourTable
  (
     USERID       INT,
     USERDATETIME DATETIME,
     OtherColumns CHAR(10)
  )

CREATE CLUSTERED INDEX IX
  ON YourTable(USERID ASC, USERDATETIME ASC);

WITH R
     AS (SELECT TOP 1 USERID,
                      USERDATETIME
         FROM   YourTable
         ORDER  BY USERID DESC,
                   USERDATETIME DESC
         UNION ALL
         SELECT SubQuery.USERID,
                SubQuery.USERDATETIME
         FROM   (SELECT T.USERID,
                        T.USERDATETIME,
                        rn = ROW_NUMBER()
                               OVER (
                                 ORDER BY T.USERID DESC, T.USERDATETIME DESC)
                 FROM   R
                        JOIN YourTable T
                          ON T.USERID < R.USERID) AS SubQuery
         WHERE  SubQuery.rn = 1)
SELECT *
FROM   R

enter image description here

If you have another table with the UserIds it is possible to get an efficient plan more easily with

SELECT U.USERID,
       CA.USERDATETIME
FROM   Users U
       CROSS APPLY (SELECT TOP 1 USERDATETIME
                    FROM   YourTable Y
                    WHERE  Y.USERID = U.USERID
                    ORDER  BY USERDATETIME DESC) CA 

enter image description here

like image 193
Martin Smith Avatar answered Nov 15 '22 01:11

Martin Smith