Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to add conditions to a MAX() call in an aggregated query?

Background

My typical use case:

# Table

id     category    dataUID
---------------------------
0         A        (NULL)
1         B        (NULL)
2         C        text1
3         C        text1
4         D        text2
5         D        text3

# Query

SELECT MAX(`id`) AS `id` FROM `table`
GROUP BY `category`

This is fine; it will strip out any "duplicate categories" in the recordset that's being worked on, giving me the "highest" ID for each category.

I can then go on use this ID to pull out all the data again:

# Query

SELECT * FROM `table` JOIN (
   SELECT MAX(`id`) AS `id` FROM `table`
   GROUP BY `category`
) _ USING(`id`)

# Result

id     category    dataUID
---------------------------
0         A        (NULL)
1         B        (NULL)
3         C        text1
5         D        text3

Note that this is not the same as:

SELECT MAX(`id`) AS `id`, `category`, `dataUID` FROM `table`
GROUP BY `category`

Per the documentation:

In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:

SELECT o.custid, c.name, MAX(o.payment)   FROM orders AS o, customers
AS c   WHERE o.custid = c.custid   GROUP BY o.custid;

For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group.

[..]

This extension assumes that the nongrouped columns will have the same group-wise values. Otherwise, the result is indeterminate.

So I'd get an unspecified value for dataUID — as an example, either text2 or text3 for result with id 5.

This is actually a problem for other fields in my real case; as it happens, for the dataUID column specifically, generally I don't really care which value I get.


Problem

However!

If any of the rows for a given category has a NULL dataUID, and at least one other row has a non-NULL dataUID, I'd like MAX to ignore the NULL ones.

So:

id     category    dataUID
---------------------------
4         D        text2
5         D        (NULL)

At present, since I pick out the row with the maximum ID, I get:

5         D        (NULL)

But, because the dataUID is NULL, instead I want:

4         D        text2

How can I get this? How can I add conditional logic to the use of aggregate MAX?


I thought of maybe handing MAX a tuple and pulling the id out from it afterwards:

GET_SECOND_PART_SOMEHOW(MAX((IF(`dataUID` NOT NULL, 1, 0), `id`))) AS `id`

But I don't think MAX will accept arbitrary expressions like that, let alone tuples, and I don't know how I'd retrieve the second part of the tuple after-the-fact.

like image 646
Lightness Races in Orbit Avatar asked Dec 07 '25 03:12

Lightness Races in Orbit


2 Answers

slight tweak to @ypercube's answer. To get the ids you can use

SELECT COALESCE(MAX(CASE
                      WHEN dataUID IS NOT NULL THEN id
                    END), MAX(id)) AS id
FROM   table
GROUP  BY category  

And then plug that into a join

like image 55
Martin Smith Avatar answered Dec 08 '25 18:12

Martin Smith


This was easier than I thought, in the end, because it turns out MySQL will accept an arbitrary expression inside MAX.

I can get the ordering I want by injecting a leading character into id to serve as an ordering hint:

SUBSTRING(MAX(IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`))) FROM 2)

Walk-through:

id     category    dataUID    IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`)
--------------------------------------------------------------------------------------
0         A        (NULL)                             a0
1         B        (NULL)                             a1
2         C        text1                              b2
3         C        text1                              b3
4         D        text2                              b4
5         D        (NULL)                             a5

So:

SELECT
   `category`, MAX(IF (`dataUID` IS NULL, CONCAT('a',`id`), CONCAT('b',`id`)) AS `max_id_with_hint`
FROM `table`
GROUP BY `category`

category   max_id_with_hint
------------------------------
A          a0
B          a1
C          b3
D          b4

It's then a simple matter to chop the ordering hint off again.

Thanks in particular to @JlStone for setting me, via COALESCE, on the path to embedding expressions inside the call to MAX and directly manipulating the values supplied to MAX.

like image 36
Lightness Races in Orbit Avatar answered Dec 08 '25 16:12

Lightness Races in Orbit