I have a table of people who have a name, location (where they live), and a parent_id (parents are stored on another table). So for example:
name | location | parent_id
--------+-----------+-----------
Joe | Chicago | 12
Sammy | Chicago | 13
Bob | SF | 13
Jim | New York | 13
Jane | Chicago | 14
Dave | Portland | 14
Al | Chicago | 15
Monica | Boston | 15
Debbie | New York | 15
Bill | Chicago | 16
Bruce | New York | 16
I need to count of how many people live in Chicago and have siblings (share a parent_id) that live in New York. So for the example above, the count would be 3.
name | location | parent_id
--------+-----------+-----------
Joe | Chicago | 12
Sammy | Chicago | 13 * sibling Jim lives in New York
Bob | SF | 13
Jim | New York | 13
Jane | Chicago | 14
Dave | Portland | 14
Al | Chicago | 15 * sibling Debbie lives in New York
Monica | Boston | 15
Debbie | New York | 15
Bill | Chicago | 16 * sibling Bruce lives in New York
Bruce | New York | 16
Can someone help me write the SQL to query this count?
Looks like Minh's answer works great, but here is another example using a Self Join.
SELECT Count(DISTINCT a.child_id)
FROM people a
JOIN people b ON a.parent_id = b.parent_id
WHERE a.location = 'Chicago' AND b.location = 'New York'
Should produce "3" for just the above table listed.
EDIT: Added a DISTINCT a.parent_id based on Lithis' suggestion.
EDIT2: As noted by Uueerdo, a child_id or some sort of unique id would really help in the case of 2 siblings who live in Chicago and 1 sibling who lives in New York. I have edited the original query to reflect this.
Since this is not truly an "answer" to your question, because there is no such child_id, I would defer to Uueerdo's answer, sorry!
SELECT COUNT(*)
FROM `people` AS p1
WHERE p1.`location` = 'Chicago'
AND p1.parent_id IN (
SELECT DISTINCT parent_id
FROM `people` AS p2
WHERE p2.`location` = 'New York'
)
;
Using Minh's as a base, this should be pretty fast; since the subquery is no longer "correlated", it should not risk the possibility of it needing executed repeatedly, once for every row in people
.
The correlated query is a very nice way to go and is very efficient. Avoid the use of distinct as it is an expensive operation. Group by is a nice alternative over the use of distinct. Understand the data and structure the query accordingly. Here is another option that is engine optimized...
select count(*)
from (select * from #t where Location = 'Chicago') ch
inner join (select * from #t where Location = 'New York') ny on ch.ParentID = ny.ParentID
Maybe try this?
SELECT Count(*)
FROM table table1
WHERE table1.location= 'Chicago'
AND EXISTS (SELECT * FROM table table2
WHERE table1.parent_id= table2.parent_id
AND table2.location= 'New York')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With