EDIT: I'm using MySQL, I found another post with the same question, but it's in Postgres; I require MySQL.
Get most common value for each value of another column in SQL
I ask this question after extensive searching of this site and others but have not found a result that works as I intend it to.
I have a table of people (recordid, personid, transactionid) and a transaction table (transactionid, rating). I require a single SQL statement that can return the most common rating each person has.
I currently have this SQL statement that returns the most common rating for a specified person id. It works and perhaps it may help others.
SELECT transactionTable.rating as MostCommonRating
FROM personTable, transactionTable
WHERE personTable.transactionid = transactionTable.transactionid
AND personTable.personid = 1
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1
However I require a statement that does what the above statement does for each personid in personTable.
My attempt is below; however, it times out my MySQL server.
SELECT personid AS pid,
(SELECT transactionTable.rating as MostCommonRating
FROM personTable, transactionTable
WHERE personTable.transactionid = transactionTable.transactionid
AND personTable.personid = pid
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1)
FROM persontable
GROUP BY personid
Any help you can give me would be much obliged. Thanks.
PERSONTABLE
:
RecordID, PersonID, TransactionID
1, Adam, 1
2, Adam, 2
3, Adam, 3
4, Ben, 1
5, Ben, 3
6, Ben, 4
7, Caitlin, 4
8, Caitlin, 5
9, Caitlin, 1
TRANSACTIONTABLE
:
TransactionID, Rating
1 Good
2 Bad
3 Good
4 Average
5 Average
The output of the SQL statement I am searching for would be:
OUTPUT:
PersonID, MostCommonRating
Adam Good
Ben Good
Caitlin Average
Please learn to use the explicit JOIN notation, not the old (pre-1992) implicit join notation.
Old style:
SELECT transactionTable.rating as MostCommonRating
FROM personTable, transactionTable
WHERE personTable.transactionid = transactionTable.transactionid
AND personTable.personid = 1
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1
Preferred style:
SELECT transactionTable.rating AS MostCommonRating
FROM personTable
JOIN transactionTable
ON personTable.transactionid = transactionTable.transactionid
WHERE personTable.personid = 1
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1
You need an ON condition for each JOIN.
Also, the personID
values in the data are strings, not numbers, so you'd need to write
WHERE personTable.personid = "Ben"
for example, to get the query to work on the tables shown.
You're seeking to find an aggregate of an aggregate: in this case, the maximum of a count. So, any general solution is going to involve both MAX and COUNT. You can't apply MAX directly to COUNT, but you can apply MAX to a column from a sub-query where the column happens to be a COUNT.
Build the query up using Test-Driven Query Design — TDQD.
SELECT p.PersonID, t.Rating, t.TransactionID
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
This result will become a sub-query.
SELECT s.PersonID, MAX(s.RatingCount)
FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
) AS s
GROUP BY s.PersonID
Now we know which is the maximum count for each person.
To get the result, we need to select the rows from the sub-query which have the maximum count. Note that if someone has 2 Good and 2 Bad ratings (and 2 is the maximum number of ratings of the same type for that person), then two records will be shown for that person.
SELECT s.PersonID, s.Rating
FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
) AS s
JOIN (SELECT s.PersonID, MAX(s.RatingCount) AS MaxRatingCount
FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
) AS s
GROUP BY s.PersonID
) AS m
ON s.PersonID = m.PersonID AND s.RatingCount = m.MaxRatingCount
If you want the actual rating count too, that's easily selected.
That's a fairly complex piece of SQL. I would hate to try writing that from scratch. Indeed, I probably wouldn't bother; I'd develop it step-by-step, more or less as shown. But because we've debugged the sub-queries before we use them in bigger expressions, we can be confident of the answer.
Note that Standard SQL provides a WITH clause that prefixes a SELECT statement, naming a sub-query. (It can also be used for recursive queries, but we aren't needing that here.)
WITH RatingList AS
(SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
)
SELECT s.PersonID, s.Rating
FROM RatingList AS s
JOIN (SELECT s.PersonID, MAX(s.RatingCount) AS MaxRatingCount
FROM RatingList AS s
GROUP BY s.PersonID
) AS m
ON s.PersonID = m.PersonID AND s.RatingCount = m.MaxRatingCount
This is simpler to write. Unfortunately, MySQL does not yet support the WITH clause.
The SQL above has now been tested against IBM Informix Dynamic Server 11.70.FC2 running on Mac OS X 10.7.4. That test exposed the problem diagnosed in the preliminary comment. The SQL for the main answer worked correctly without needing to be changed.
Here's a somewhat hacky abuse of the fact that the max
aggregate function in MySQL does lexical sorting on varchars (as well as the expected numerical sorting on integers/floats):
SELECT
PersonID,
substring(max(concat(lpad(c, 20, '0'), Rating)), 21) AS MostFrequentRating
FROM (
SELECT PersonID, Rating, count(*) c
FROM PERSONTABLE INNER JOIN TRANSACTIONTABLE USING(TransactionID)
GROUP BY PersonID, Rating
) AS grouped_ratings
GROUP BY PersonID;
Which gives the desired:
+----------+--------------------+
| PersonID | MostFrequentRating |
+----------+--------------------+
| Adam | Good |
| Ben | Good |
| Caitlin | Average |
+----------+--------------------+
(note, if there are multiple modes per person, it will pick the one with the highest alphabetic entry, so — pretty much randomly — Good over Bad and Bad over Average)
You should be able to see what the max
is operating over by examining the following:
SELECT PersonID, Rating, count(*) c, concat(lpad(count(*), 20, '0'), Rating) as LexicalMaxMe
FROM PERSONTABLE INNER JOIN TRANSACTIONTABLE USING(TransactionID)
GROUP BY PersonID, Rating
ORDER BY PersonID, c DESC;
Which outputs:
+----------+---------+---+-----------------------------+
| PersonID | Rating | c | LexicalMaxMe |
+----------+---------+---+-----------------------------+
| Adam | Good | 2 | 00000000000000000002Good |
| Adam | Bad | 1 | 00000000000000000001Bad |
| Ben | Good | 2 | 00000000000000000002Good |
| Ben | Average | 1 | 00000000000000000001Average |
| Caitlin | Average | 2 | 00000000000000000002Average |
| Caitlin | Good | 1 | 00000000000000000001Good |
+----------+---------+---+-----------------------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With