JOIN on another table after GROUP BY and COUNT

Question

I'm trying to make sense of the right way to use JOIN, COUNT(*), and GROUP BY to do a pretty simple query. I've actually gotten it to work (see below) but from what I've read, I'm using an extra GROUP BY that I shouldn't be.

(Note: The problem below isn't my actual problem (which deals with more complicated tables), but I've tried to come up with an analogous problem)

I have two tables:

Table: Person
-------------
key  name     cityKey
1    Alice    1
2    Bob      2
3    Charles  2
4    David    1

Table: City
-------------
key  name
1    Albany
2    Berkeley
3    Chico

I'd like to do a query on the People (with some WHERE clause) that returns

the number of matching people in each city
the key for the city
the name of the city.

If I do

SELECT COUNT(Person.key) AS count, City.key AS cityKey, City.name AS cityName
FROM Person 
LEFT JOIN City ON Person.cityKey = City.key 
GROUP BY Person.cityKey, City.name

I get the result that I want

count   cityKey   cityName
2       1         Albany
2       2         Berkeley

However, I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.

So what's the right way to do this? I've been trying to google for an answer, but I feel like there's something fundamental that I'm just not getting.

Pointy · Accepted Answer

I don't think that it's "wrong" in this case, because you've got a one-to-one relationship between city name and city key. You could rewrite it such that you join to a sub-select to get the count of persons to cities by key, to the city table again for the name, but it's debatable that that'd be better. It's a matter of style and opinion I guess.

select PC.ct, City.key, City.name
  from City
  join (select count(Person.key) ct, cityKey key from Person group by cityKey) PC
    on City.key = PC.key

if my SQL isn't too rusty :-)

select PC.ct, City.key, City.name
  from City
  join (select count(Person.key) ct, cityKey key from Person group by cityKey) PC
    on City.key = PC.key

if my SQL isn't too rusty :-)

OMG Ponies · Answer

...I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.

You misunderstand, you got it backwards.
Standard SQL requires you to specify in the GROUP BY all the columns mentioned in the SELECT that are not wrapped in aggregate functions. If you don't want certain columns in the GROUP BY, wrap them in aggregate functions. Depending on the database, you could use the analytic/windowing function OVER...

However, MySQL and SQLite provide the "feature" where you can omit these columns from the group by - which leads to no end of "why doesn't this port from MySQL to fill_in_the_blank database?!" Stackoverflow and numerous other sites & forums.

ahsteele · Answer

However, I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.

It's not wrong. You have to understand how the Query Optimizer sees your query. The order in which it is parsed is what requires you to "throw the last part in." The optimizer sees your query in something akin to this order:

the required tables are joined
the composite dataset is filtered through the WHERE clause
the remaining rows are chopped into groups by the GROUP BY clause, and aggregated
they are then filtered again, through the HAVING clause
finally operated on, by SELECT / ORDER BY, UPDATE or DELETE.

The point here is that it's not that the GROUP BY has to name all the columns in the SELECT, but in fact it is the opposite - the SELECT cannot include any columns not already in the GROUP BY.

Andomar · Answer

Your query would only work on MySQL, because you group on Person.cityKey but select city.key. All other databases would require you to use an aggregate like min(city.key), or to add City.key to the group by clause.

Because the combination of city name and city key is unique, the following are equivalent:

select    count(person.key), min(city.key), min(city.name)
...
group by  person.citykey

Or:

select    count(person.key), city.key, city.name
...
group by  person.citykey, city.key, city.name

Or:

select    count(person.key), city.key, max(city.name)
...
group by  city.key

All rows in the group will have the same city name and key, so it doesn't matter if you use the max or min aggregate.

P.S. If you'd like to count only different persons, even if they have multiple rows, try:

count(DISTINCT person.key)

instead of

count(person.key)

JOIN on another table after GROUP BY and COUNT

Tags:

sql

count

left-join

group-by

aggregate-functions

Craig S

4 Answers

Pointy

OMG Ponies

ahsteele

Andomar

Recent Activity

Donate For Us

JOIN on another table after GROUP BY and COUNT

Tags:

sql

count

left-join

group-by

aggregate-functions

Craig S

4 Answers

Pointy

OMG Ponies

ahsteele

Andomar

Related questions

Recent Activity

Donate For Us