Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL aggregate query question

Tags:

sql

derby

Can anyone help me with a SQL query in Apache Derby SQL to get a "simple" count.

Given a table ABC that looks like this...

    
    id  a   b   c
    1   1   1   1
    2   1   1   2
    3   2   1   3
    4   2   1   1
**  5   2   1   2 **
**  6   2   2   1 **
    7   3   1   2
    8   3   1   3
    9   3   1   1

How can I write a query to get a count of how may distinct values of 'a' have both (b=1 and c=2) AND (b=2 and c=1) to get the correct result of 1. (the two rows marked match the criteria and both have a value of a=2, there is only 1 distinct value of a in this table that match the criteria)

The tricky bit is that (b=1 and c=2) AND (b=2 and c=1) are obviously mutually exclusive when applied to a single row. .. so how do I apply that expression across multiple rows of distinct values for a?

These queries are wrong but to illustrate what I'm trying to do...
SELECT DISTINCT COUNT(a) WHERE b=1 AND c=2 AND b=2 AND c=1 ...
.. (0) no go as mutually exclusive
SELECT DISTINCT COUNT(a) WHERE b=1 AND c=2 OR b=2 AND c=1 ...
.. (3) gets me the wrong result.
SELECT COUNT(a) (CASE WHEN b=1 AND c=10 THEN 1 END) FROM ABC WHERE b=2 AND c=1
.. (0) no go as mutually exclusive

Cheers, Phil.

like image 459
Phil Avatar asked May 16 '10 22:05

Phil


2 Answers

I'm assuming that (a,b,c) is unique. One way to do this is to use a self join:

SELECT COUNT(*)
FROM   ABC T1
JOIN   ABC T2
ON     T1.a = T2.a
WHERE  T1.b = 1 AND T1.c = 2
AND    T2.b = 2 AND T2.c = 1

This works conceptually as follows:

  • Find all the rows that satisfy (b,c) = (1,2)
  • Find all the rows that satisfy (b,c) = (2,1)
  • Join the above two sets on when a is the same.
  • Count the number of rows in the joined result.

An alternative way which might be easier to understand is to use a subselect:

SELECT COUNT(*)
FROM   ABC
WHERE  a IN (SELECT a FROM ABC
             WHERE  b = 2
             AND    c = 1)
AND    b = 1
AND    c = 2

Note: If there can be duplicated values of (a,b,c) then instead of SELECT COUNT(*) use SELECT COUNT(DISTINCT T1.a) in the first query, and SELECT COUNT(DISTINCT a) in the second.

These queries are tested in MySQL, not Apache Derby, but I hope they will work there too.

like image 178
Mark Byers Avatar answered Nov 15 '22 09:11

Mark Byers


Mark's second query should indeed be supported by Apache Derby, according to Apache Derby's SQL support page.

SELECT COUNT(DISTINCT a) FROM ABC
 WHERE b = 1 AND c = 2
   AND a IN (SELECT a FROM ABC WHERE b = 2 AND c = 1);

In addition to being easier to read than the self-join version, it should also be faster, since you avoid the overhead of having to do a JOIN.

like image 28
Steve Simms Avatar answered Nov 15 '22 11:11

Steve Simms