I have a table named t1 with following fields: ROWID, CID, PID, Score, SortKey
it has the following data:
1, C1, P1, 10, 1
2, C1, P2, 20, 2
3, C1, P3, 30, 3
4, C2, P4, 20, 3
5, C2, P5, 30, 2
6, C3, P6, 10, 1
7, C3, P7, 20, 2
what query do I write so that it applies group by on CID, but instead of returning me 1 single result per group, it returns me a max of 2 results per group. also where condition is score >= 20 and I want the results ordered by CID and SortKey.
If I had to run my query on above data, I would expect the following result:
RESULTS FOR C1 - note: ROWID 1 is not considered as its score < 20
C1, P2, 20, 2
C1, P3, 30, 3
RESULTS FOR C2 - note: ROWID 5 appears before ROWID 4 as ROWID 5 has lesser value SortKey
C2, P5, 30, 2
C2, P4, 20, 3
RESULTS FOR C3 - note: ROWID 6 does not appear as its score is less than 20 so only 1 record returned here
C3, P7, 20, 2
IN SHORT, I WANT A LIMIT WITHIN A GROUP BY. I want the simplest solution and want to avoid temp tables. sub queries are fine. Also note I am using SQLite for this.
No, you can't LIMIT subqueries arbitrarily (you can do it to a limited extent in newer MySQLs, but not for 5 results per group). This is a groupwise-maximum type query, which is not trivial to do in SQL.
The ORDER BY clause goes after the FROM clause but before the LIMIT .
The SQL LIMIT clause constrains the number of rows returned by a SELECT statement. For Microsoft databases like SQL Server or MSAccess, you can use the SELECT TOP statement to limit your results, which is Microsoft's proprietary equivalent to the SELECT LIMIT statement.
The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.
Here's a fairly portable query to do what you want:
SELECT *
FROM table1 a
WHERE a."ROWID" IN (
SELECT b."ROWID"
FROM table1 b
WHERE b."Score" >= 20
AND b."ROWID" IS NOT NULL
AND a."CID" = b."CID"
ORDER BY b."CID", b."SortKey"
LIMIT 2
)
ORDER BY a."CID", a."SortKey";
The query uses a correlated subquery with a sort and limit to produce a list of ROWID
s that should appear in the final result. Because the correlated subquery is executed for every row, whether or not it's included in the result, it may not be as efficient as the window function version given below - but unlike that version it'll work on SQLite3, which doesn't support window functions.
This query requires that ROWID
is unique (can be used as a primary key).
I tested the above in PostgreSQL 9.2 and in SQLite3 3.7.11 ; it works fine in both. It won't work on MySQL 5.5 or the latest 5.6 milestone because MySQL doesn't support LIMIT
in a subquery used with IN
.
SQLFiddle demos:
PostgreSQL (works fine): http://sqlfiddle.com/#!12/22829/3
SQLite3 (works fine, same query text, but needed single-valued inserts due to apparent JDBC driver limitation): http://sqlfiddle.com/#!7/9ecd8/1
MySQL 5.5 (fails two ways; MySQL doesn't like a."ROWID"
quoting even in ANSI
mode so I had to un-quote; then it fails with This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery
): http://sqlfiddle.com/#!2/e1f31/2
SQLite demo showing it works just fine on the SQLite3 command line: http://pastebin.com/26n4NiUC
Output (PostgreSQL):
ROWID | CID | PID | Score | SortKey
-------+-----+-----+-------+---------
2 | C1 | P2 | 20 | 2
3 | C1 | P3 | 30 | 3
5 | C2 | P5 | 30 | 2
4 | C2 | P4 | 20 | 3
7 | C3 | P7 | 20 | 2
(5 rows)
If you want to filter for a particular CID
, just add AND "CID" = 'C1'
or whatever to the outer WHERE
clause.
Here's a closely related answer with more detailed examples: https://stackoverflow.com/a/13411138/398670
Since this was originally tagged just SQL
(no SQLite)... just for completeness, in PostgreSQL or other DBs with SQL-standard window function support I'd probably do this:
SELECT "ROWID", "CID", "PID", "Score", "SortKey"
FROM (
SELECT *, row_number() OVER (PARTITION BY "CID" ORDER BY "SortKey") AS n
FROM table1
WHERE "Score" >= 20
) x
WHERE n < 3
ORDER BY "CID", "SortKey";
which produces the same result. SQLFiddle, including extra C1
row to demonstrate that the limiting filter actually works: http://sqlfiddle.com/#!12/22829/1
If you want to filter for a particular CID
, just add AND "CID" = 'C1'
or whatever to the inner WHERE
clause.
BTW, your test data is insufficient, since it can never have more than two rows for any CID with score > 20 anyway.
This is not actually a GROUP BY
problem (you're not aggregating values). This is a greatest-n-per-group problem (I think there's actually a greatest-n-per-group
tag here at Stackoverflow).
The exact details of a solution will depend on issues such as whether you ever have the same sort key twice per group. You can start with something like this:
SELECT * FROM table T1 WHERE Score > 20 AND
(SELECT COUNT(*) FROM table T2
WHERE T2.CID = T1.CID AND T2.SortKey <= T1.SortKey AND T2.RowID <> T1.RowID
AND T1.Score > 20) < 2;
ORDER BY CID, SortKey;
What this does is consider only those rows with scores above 20. Then, for each candidate row it counts the number of other rows in the same table that have scores > 20 but sortkeys less than or equal to this row's sortkey. If that number is 0 or 1 row, then this row qualifies for inclusion in the results.
Finally ORDER by performs your sort.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With