Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select one value from a group based on order from other columns

Tags:

sql

mysql

mariadb

Problem

Suppose I have this table tab (fiddle available).

| g | a | b |     v |
---------------------
| 1 | 3 | 5 |   foo |
| 1 | 4 | 7 |   bar |
| 1 | 2 | 9 |   baz |
| 2 | 1 | 1 |   dog |
| 2 | 5 | 2 |   cat |
| 2 | 5 | 3 | horse |
| 2 | 3 | 8 |   pig |

I'm grouping rows by g, and for each group I want one value from column v. However, I don't want any value, but I want the value from the row with maximal a, and from all of those, the one with maximal b. In other words, my result should be

| 1 |   bar |
| 2 | horse |

Current solution

I know of a query to achieve this:

SELECT grps.g,
(SELECT v FROM tab
 WHERE g = grps.g
 ORDER BY a DESC, b DESC
 LIMIT 1) AS r
FROM (SELECT DISTINCT g FROM tab) grps

Question

But I consider this query rather ugly. Mostly because it uses a dependant subquery, which feels like a real performance killer. So I wonder whether there is an easier solution to this problem.

Expected answers

The most likely answer I expect to this question would be some kind of add-on or patch for MySQL (or MariaDB) which does provide a feature for this. But I'll welcome other useful inspirations as well. Anything which works without a dependent subquery would qualify as an answer.

If your solution only works for a single ordering column, i.e. couldn't distinguish between cat and horse, feel free to suggest that answer as well as I expect it to be still useful to the majority of use cases. For example, 100*a+b would be a likely way to order the above data by both columns while still using only a single expression.

I have a few pretty hackish solutions in mind, and might add them after a while, but I'll first look and see whether some nice new ones pour in first.


Benchmark results

As it is pretty hard to compare the various answers just by looking at them, I've run some benchmarks on them. This was run on my own desktop, using MySQL 5.1. The numbers won't compare to any other system, only to one another. You probably should be doing your own tests with your real-life data if performance is crucial to your application. When new answers come in, I might add them to my script, and re-run all the tests.

  • 100,000 items, 1,000 groups to choose from, InnoDb:
    1. 0.166s for MvG (from question)
    2. 0.520s for RichardTheKiwi
    3. 2.199s for xdazz
    4. 19.24s for Dems (sequential sub-queries)
    5. 48.72s for acatt
  • 100,000 items, 50,000 groups to choose from, InnoDb:
    1. 0.356s for xdazz
    2. 0.640s for RichardTheKiwi
    3. 0.764s for MvG (from question)
    4. 51.50s for acatt
    5. too long for Dems (sequential sub-queries)
  • 100,000 items, 100 groups to choose from, InnoDb:
    1. 0.163s for MvG (from question)
    2. 0.523s for RichardTheKiwi
    3. 2.072s for Dems (sequential sub-queries)
    4. 17.78s for xdazz
    5. 49.85s for acatt

So it seems that my own solution so far isn't all that bad, even with the dependent subquery. Surprisingly, the solution by acatt, which uses a dependent subquery as well and which I therefore would have considered about the same, performs much worse. Probably something the MySQL optimizer can't cope with. The solution RichardTheKiwi proposed seems to have good overall performance as well. The other two solutions heavily depend on the structure of the data. With many groups small groups, xdazz' approach outperforms all other, whereas the solution by Dems performs best (though still not exceptionally good) for few large groups.

like image 299
MvG Avatar asked Oct 04 '12 11:10

MvG


People also ask

How do you use ORDER BY along with GROUP BY?

Using Group By and Order By Together When combining the Group By and Order By clauses, it is important to bear in mind that, in terms of placement within a SELECT statement: The GROUP BY clause is placed after the WHERE clause. The GROUP BY clause is placed before the ORDER BY clause.

Can we use SELECT * with GROUP BY?

Cannot use an aggregate or a subquery in an expression used for the group by list of a GROUP BY clause. The original idea was to create the table in beginning of the query, so the (SELECT * FROM #TBL) could be used on the query itself, instead of defining the names on each GROUP BY.

Can we use ORDER BY and GROUP BY in same query?

In the query, GROUP BY clause is placed after the WHERE clause. In the query, GROUP BY clause is placed before ORDER BY clause if used any.

Can you GROUP BY and ORDER BY in SQL?

Order By and Group By Clause in SQLGroup By in SQL is used to arrange similar data into groups and Order By in SQL is used to sort the data in ascending or descending order.


2 Answers

SELECT g, a, b, v
  FROM (
            SELECT *, 
                   @rn := IF(g = @g, @rn + 1, 1) rn, 
                   @g := g
              FROM (select @g := null, @rn := 0) x, 
                   tab
          ORDER BY g, a desc, b desc, v
       ) X
 WHERE rn = 1;

Single pass. All the other solutions look O(n^2) to me.

like image 140
RichardTheKiwi Avatar answered Oct 14 '22 17:10

RichardTheKiwi


This way doesn't use sub-query.

SELECT t1.g, t1.v
FROM tab t1
LEFT JOIN tab t2 ON t1.g = t2.g AND (t1.a < t2.a OR (t1.a = t2.a AND t1.b < t2.b))
WHERE t2.g IS NULL

Explanation:

The LEFT JOIN works on the basis that when t1.a is at its maximum value, there is no s2.a with a greater value and the s2 rows values will be NULL.

like image 31
xdazz Avatar answered Oct 14 '22 16:10

xdazz