Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate "empty" aggregate results in SQL

I'm trying to refine a SQL query to make my reports looks better. My query reads data from one table, groups by a few colums and calculates some aggregate fields (counts and sums).

SELECT A, B, C, COUNT(*), SUM(D) FROM T
GROUP BY A, B, C
ORDER BY A, B, C

Now, let's assume B and C columns are some defined constant strings, for example, B can be 'B1' or 'B2', C can be 'C1' or 'C2'. So, an example resultset is:

A  | B  | C  | COUNT(*) | SUM(D)
--------------------------------
A1 | B1 | C1 |       34 |   1752
A1 | B1 | C2 |        4 |    183
A1 | B2 | C1 |      199 |   8926
A1 | B2 | C2 |       56 |   2511
A2 | B1 | C2 |        6 |     89
A2 | B2 | C2 |       12 |    231
A3 | B1 | C1 |       89 |    552
...

As you can see, for 'A1' I have all four possible (B, C) combination, but that's not true for 'A2'. My question is: how can I generate also summary rows for (B, C) combination not present, in fact, in the given table? That is, how can I print, for example, also these rows:

A  | B  | C  | COUNT(*) | SUM(D)
--------------------------------
A2 | B1 | C1 |        0 |      0
A2 | B2 | C1 |        0 |      0

The only solution I can see is to create some auxiliarity tables with all (B, C) values and then make a RIGHT OUTER JOIN with that aux table. But I'm searching for a cleaner way...

Thank you all.

like image 292
lorenzo-s Avatar asked May 10 '12 07:05

lorenzo-s


2 Answers

The auxiliary table doesn't have to be a real table, it can be a common table expression - at least if you can get all possible values (or all you're interested in) from the table itself. Using @Bob Jarvis' query to generate all possible combinations you can do something like:

WITH CTE AS (
    SELECT * FROM (SELECT DISTINCT a FROM T)
    JOIN (SELECT DISTINCT b, c FROM T) ON (1 = 1)
)
SELECT CTE.A, CTE.B, CTE.C,
    SUM(CASE WHEN T.A IS NULL THEN 0 ELSE 1 END), NVL(SUM(T.D),0)
FROM CTE
LEFT JOIN T ON T.A = CTE.A AND T.B = CTE.B AND T.C = CTE.C
GROUP BY CTE.A, CTE.B, CTE.C
ORDER BY CTE.A, CTE.B, CTE.C;

If you have fixed values that may not be in the table then it's a little more complicated (or uglier anyway, and gets worse with more possible values):

WITH CTE AS (
    SELECT * FROM (SELECT DISTINCT a FROM T)
    JOIN (SELECT 'B1' AS B FROM DUAL
        UNION ALL SELECT 'B2' FROM DUAL) ON (1 = 1)
    JOIN (SELECT 'C1' AS C FROM DUAL
        UNION ALL SELECT 'C2' FROM DUAL) ON (1 = 1)
)
SELECT CTE.A, CTE.B, CTE.C,
    SUM(CASE WHEN T.A IS NULL THEN 0 ELSE 1 END), NVL(SUM(T.D),0)
FROM CTE
LEFT JOIN T ON T.A = CTE.A AND T.B = CTE.B AND T.C = CTE.C
GROUP BY CTE.A, CTE.B, CTE.C
ORDER BY CTE.A, CTE.B, CTE.C;

But you have to join to something that knows about the 'missing' values. If the same logic is needed elsewhere, and you have fixed values, then a permanent table might be cleaner - maintenance may be needed either way of course. You could also consider a pipelined function to act as a surrogate table, but depends on volumes maybe.

like image 136
Alex Poole Avatar answered Nov 19 '22 11:11

Alex Poole


The thing is, if you don't have a particular combination in your database, how would an engine know to include that combination into the results? In order to have all combinations in the results, you need to have all combinations available - whether in the main table or in some other table used for referencing. For example, you can create another table R with data like so:

A  | B  | C  
------------
A1 | B1 | C1
A1 | B1 | C2
A1 | B2 | C1
A1 | B2 | C2
A2 | B1 | C1
A2 | B1 | C2
A2 | B2 | C1
A2 | B2 | C2
A3 | B1 | C1
A3 | B1 | C2
A3 | B1 | C1
A3 | B2 | C2
...

And then your query would look like this:

SELECT r.*, COUNT(t.d), coalesce(SUM(t.d), 0)
FROM r LEFT OUTER JOIN t on (r.a=t.a and r.b=t.b and r.c=t.c)
GROUP BY r.a, r.b, r.c
ORDER BY r.a, r.b, r.c

This will return you the set as you want with 0 | 0 for combination that don't exist in the main table. Note that this is only possible if you do know every possible combination you want to include, which may not always be the case.

If on the other hand your A, B, C are numerical values and you just want to include all numbers in a range, then there may be another way of dealing with this, something like this:

SELECT a.n, b.n, c.n, COUNT(t.d), coalesce(SUM(t.d), 0)
FROM (SELECT (rownum) "n" FROM DUAL WHERE LEVEL >= start_a CONNECT BY LEVEL <= end_a) a,
     (SELECT (rownum) "n" FROM DUAL WHERE LEVEL >= start_b CONNECT BY LEVEL <= end_b) b,
     (SELECT (rownum) "n" FROM DUAL WHERE LEVEL >= start_c CONNECT BY LEVEL <= end_c) c,
     t
WHERE a.n = t.a(+) AND b.n = t.b(+) AND c.n = t.c(+)
GROUP BY a.n, b.n, c.n
ORDER BY a.n, b.n, c.n

(I don't have an Oracle instance handy to test this, so this is more of a somewhat educated guess rather than anything else.)

The bottom line is the engine needs to know what to include into the final results - one way or another.

like image 34
Aleks G Avatar answered Nov 19 '22 13:11

Aleks G