Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count SQL records based on sibling property

Tags:

sql

mysql

I have a table of people who have a name, location (where they live), and a parent_id (parents are stored on another table). So for example:

name    | location  | parent_id
--------+-----------+-----------
Joe     | Chicago   | 12
Sammy   | Chicago   | 13
Bob     | SF        | 13
Jim     | New York  | 13
Jane    | Chicago   | 14
Dave    | Portland  | 14
Al      | Chicago   | 15
Monica  | Boston    | 15
Debbie  | New York  | 15
Bill    | Chicago   | 16
Bruce   | New York  | 16

I need to count of how many people live in Chicago and have siblings (share a parent_id) that live in New York. So for the example above, the count would be 3.

name    | location  | parent_id
--------+-----------+-----------
Joe     | Chicago   | 12
Sammy   | Chicago   | 13   * sibling Jim lives in New York
Bob     | SF        | 13
Jim     | New York  | 13
Jane    | Chicago   | 14
Dave    | Portland  | 14
Al      | Chicago   | 15   * sibling Debbie lives in New York
Monica  | Boston    | 15
Debbie  | New York  | 15
Bill    | Chicago   | 16   * sibling Bruce lives in New York
Bruce   | New York  | 16

Can someone help me write the SQL to query this count?

like image 530
dustinbrownman Avatar asked May 28 '15 16:05

dustinbrownman


4 Answers

Looks like Minh's answer works great, but here is another example using a Self Join.

SELECT Count(DISTINCT a.child_id)
FROM people a
    JOIN people b ON a.parent_id = b.parent_id
WHERE a.location = 'Chicago' AND b.location = 'New York'

Should produce "3" for just the above table listed.

EDIT: Added a DISTINCT a.parent_id based on Lithis' suggestion.

EDIT2: As noted by Uueerdo, a child_id or some sort of unique id would really help in the case of 2 siblings who live in Chicago and 1 sibling who lives in New York. I have edited the original query to reflect this.

Since this is not truly an "answer" to your question, because there is no such child_id, I would defer to Uueerdo's answer, sorry!

like image 99
PDice30 Avatar answered Nov 16 '22 00:11

PDice30


SELECT COUNT(*)
FROM `people` AS p1
WHERE p1.`location` = 'Chicago'
      AND p1.parent_id IN (
         SELECT DISTINCT parent_id 
         FROM `people` AS p2
         WHERE p2.`location` = 'New York'
        )
;

Using Minh's as a base, this should be pretty fast; since the subquery is no longer "correlated", it should not risk the possibility of it needing executed repeatedly, once for every row in people.

like image 21
Uueerdo Avatar answered Nov 16 '22 00:11

Uueerdo


The correlated query is a very nice way to go and is very efficient. Avoid the use of distinct as it is an expensive operation. Group by is a nice alternative over the use of distinct. Understand the data and structure the query accordingly. Here is another option that is engine optimized...

select count(*) 
from (select * from #t where Location = 'Chicago') ch
inner join (select * from #t where Location = 'New York') ny on ch.ParentID = ny.ParentID
like image 21
Brian Avatar answered Nov 16 '22 00:11

Brian


Maybe try this?

SELECT Count(*)
FROM table table1
WHERE table1.location= 'Chicago'
      AND EXISTS (SELECT * FROM table table2
                     WHERE table1.parent_id= table2.parent_id
                           AND table2.location= 'New York')
like image 40
JustAPup Avatar answered Nov 15 '22 23:11

JustAPup