I have two queries. One of them makes sense to me, the other don't. First one:
SELECT gender AS 'Gender', count(*) AS '#' FROM registrations GROUP BY gender WITH ROLLUP
That gives me this:
Gender # Female 20 Male 19 NULL 39
So, I get the count, and the total count. What I expected. Next one:
SELECT c.printable_name AS 'Country', count(*) AS '#' FROM registrations r INNER JOIN country c ON r.country = c.country_id GROUP BY country WITH ROLLUP Country # Denmark 9 Norway 10 Sweden 18 United States 1 Uzbekistan 1 Uzbekistan 39
Same result. But why do I get Uzbekistan for the total??
The GROUP BY clause permits a WITH ROLLUP modifier that causes summary output to include extra rows that represent higher-level (that is, super-aggregate) summary operations. ROLLUP thus enables you to answer questions at multiple levels of analysis with a single query.
SELECT DISTINCT ColumnName FROM TableName; Using the COUNT() function with the GROUP BY clause, then the query will produce the number of values as the count for each subgroup created based on the table column values or expressions.
Many SQL products, including MySQL, do not support grouping with ROLLUP and CUBE. But because several other products offer this feature, it is discussed here. It often happens that data has to be aggregated on different levels. Example 10.23 is a clear example.
In MySQL, the COUNT() function calculates the number of results from a table when executing a SELECT statement. It does not contain NULL values. The function returns a BIGINT value. It can count all the matched rows or only rows that match the specified conditions.
But why do I get Uzbekistan for the total??
Because you're not SELECTing the item that you're GROUPing BY. If you said:
GROUP BY c.printable_name
You'd get the expected NULL. However you're grouping by a different column so MySQL doesn't know that printable_name is taking part in a rollup-group, and selects any old value from that column, in the join of all registrations. (So it is possible you will see other countries than Uzbekistan.)
This is part of a wider problem with MySQL being permissive on what you can SELECT in a GROUP BY query. For example, you can say:
SELECT gender FROM registrations GROUP BY country;
and MySQL will happily pick one of the gender values for a registration from each country, even though there is no direct causal link (aka “functional dependency”) between country and gender. Other DBMSs will refuse the above command on the grounds that there isn't guaranteed to be one gender per country.(*)
Now, this:
SELECT c.printable_name AS 'Country', count(*) AS '#' FROM registrations r INNER JOIN country c ON r.country = c.country_id GROUP BY country
is OK, because there's a functional dependency between r.country and c.printable_name (assuming you have correctly described your country_id as a PRIMARY KEY).
However MySQL's WITH ROLLUP extension is a bit of a hack in the way it works. On the rollup row stage at the end, it runs over the entire pre-grouping result set to grab its values, and then sets the group-by column to NULL. It doesn't also null other columns that have a functional dependency on that column. It probably should, but MySQL currently doesn't really understand the whole thing about functional dependencies.
So if you select c.printable_name it will show you whichever country name value it randomly picked, and if you select c.country_id it will show you whichever country ID it randomly picked — even though c.country_id is the join criterion, so must be the same as r.country, which is NULL!
What you can do to work around the problem is:
(*: MySQL has an SQL_MODE option ONLY_FULL_GROUP_BY that is supposed to address this issue, but it goes much too far and only lets you select columns from the GROUP BY, not columns that have a functional dependency on the GROUP BY. So it will make valid queries fail as well, making it generally useless.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With