Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: Most common value per id

Tags:

sql

mysql

My Table named GameEvent displays like this:

+---------------+-----------+
| sessionNumber | colorName |
+---------------+-----------+
|             1 | Green     |
|             1 | Red       |
|             1 | Green     |
|             1 | Blue      |
|             2 | Blue      |
+---------------+-----------+

I want to make a Query which will return the sessionNumber and most common color for example sessionNumer 1 -> Green, sessionNumber 2 -> Blue. How do I do this?

like image 564
Terry Avatar asked Jan 08 '23 07:01

Terry


2 Answers

You start with the basic aggregation query to get the counts for sessions and colors:

select sessionNumber, colorName, count(*) as cnt
from GameEvent ge
group by sessionNumber, colorName;

Then you really wish that MySQL supported ANSI window functionality -- the only major database that doesn't support row_number(). But, alas, this functionality is not available. So, you have three choices:

  • Tricks with string aggregation.
  • Variables.
  • Complex SQL.

The following shows the first method:

select sessionNumber,
       substring_index(group_concat(colorName order by cnt desc), ',', 1) as MostCommonColor
from (select sessionNumber, colorName, count(*) as cnt
      from GameEvent ge
      group by sessionNumber, colorName
     ) sc
group by sessionNumber;

Note that there are limitations on this query. The intermediate results from group_concat() have a maximum length (controlled by a parameter that can be changed). Also, if the color names contain commas, then you would want to use a different character.

like image 157
Gordon Linoff Avatar answered Jan 10 '23 19:01

Gordon Linoff


What you can do is select the number of each pair by using a GROUP BY clause and the COUNT() function like this:

SELECT sessionNumber, colorName, COUNT(*) AS numOccurrences
FROM myTable
GROUP BY sessionNumber, colorName;

Once you have that, you'll need to find the largest number of occurrences for each sessionNumber. That can be accomplished by using aggregation again:

SELECT sessionNumber, MAX(numOccurrences) AS largestCount
FROM(
  SELECT sessionNumber, colorName, COUNT(*) AS numOccurrences
  FROM myTable
  GROUP BY sessionNumber, colorName) tmp
GROUP BY sessionNumber;

The last thing you have to do is join that back to the first subquery, on the condition that the session number and count match, and you will get the sessionNumber and color you need:

SELECT t1.sessionNumber, t1.colorName
FROM(
  SELECT sessionNumber, colorName, COUNT(*) AS numOccurrences
  FROM myTable
  GROUP BY sessionNumber, colorName) t1
JOIN(
  SELECT sessionNumber, MAX(numOccurrences) AS largestCount
  FROM(
    SELECT sessionNumber, colorName, COUNT(*) AS numOccurrences
    FROM myTable
    GROUP BY sessionNumber, colorName) tmp
  GROUP BY sessionNumber) t2
ON t2.sessionNumber = t1.sessionNumber AND t2.largestCount = t1.numOccurrences;

Here is an SQL Fiddle example that shows all three steps. It is important to note that you can't use LIMIT here because that clause will limit your entire result set, not each group individually.

like image 32
AdamMc331 Avatar answered Jan 10 '23 19:01

AdamMc331