Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL: Total GROUP BY WITH ROLLUP curiosity

Tags:

I have two queries. One of them makes sense to me, the other don't. First one:

SELECT gender AS 'Gender', count(*) AS '#'     FROM registrations      GROUP BY gender WITH ROLLUP 

That gives me this:

Gender       # Female      20 Male        19 NULL        39 

So, I get the count, and the total count. What I expected. Next one:

SELECT c.printable_name AS 'Country', count(*) AS '#'      FROM registrations r      INNER JOIN country c ON r.country = c.country_id      GROUP BY country WITH ROLLUP  Country         # Denmark         9 Norway         10 Sweden         18 United States   1 Uzbekistan      1 Uzbekistan     39 

Same result. But why do I get Uzbekistan for the total??

like image 590
Svish Avatar asked Mar 18 '09 19:03

Svish


People also ask

What does the GROUP BY modifier with rollup do to a query?

The GROUP BY clause permits a WITH ROLLUP modifier that causes summary output to include extra rows that represent higher-level (that is, super-aggregate) summary operations. ROLLUP thus enables you to answer questions at multiple levels of analysis with a single query.

How do I count groups in MySQL?

SELECT DISTINCT ColumnName FROM TableName; Using the COUNT() function with the GROUP BY clause, then the query will produce the number of values as the count for each subgroup created based on the table column values or expressions.

Does MySQL support rollup?

Many SQL products, including MySQL, do not support grouping with ROLLUP and CUBE. But because several other products offer this feature, it is discussed here. It often happens that data has to be aggregated on different levels. Example 10.23 is a clear example.

How do I count rows in MySQL by group?

In MySQL, the COUNT() function calculates the number of results from a table when executing a SELECT statement. It does not contain NULL values. The function returns a BIGINT value. It can count all the matched rows or only rows that match the specified conditions.


1 Answers

But why do I get Uzbekistan for the total??

Because you're not SELECTing the item that you're GROUPing BY. If you said:

GROUP BY c.printable_name 

You'd get the expected NULL. However you're grouping by a different column so MySQL doesn't know that printable_name is taking part in a rollup-group, and selects any old value from that column, in the join of all registrations. (So it is possible you will see other countries than Uzbekistan.)

This is part of a wider problem with MySQL being permissive on what you can SELECT in a GROUP BY query. For example, you can say:

SELECT gender FROM registrations GROUP BY country; 

and MySQL will happily pick one of the gender values for a registration from each country, even though there is no direct causal link (aka “functional dependency”) between country and gender. Other DBMSs will refuse the above command on the grounds that there isn't guaranteed to be one gender per country.(*)

Now, this:

SELECT c.printable_name AS 'Country', count(*) AS '#'  FROM registrations r  INNER JOIN country c ON r.country = c.country_id  GROUP BY country 

is OK, because there's a functional dependency between r.country and c.printable_name (assuming you have correctly described your country_id as a PRIMARY KEY).

However MySQL's WITH ROLLUP extension is a bit of a hack in the way it works. On the rollup row stage at the end, it runs over the entire pre-grouping result set to grab its values, and then sets the group-by column to NULL. It doesn't also null other columns that have a functional dependency on that column. It probably should, but MySQL currently doesn't really understand the whole thing about functional dependencies.

So if you select c.printable_name it will show you whichever country name value it randomly picked, and if you select c.country_id it will show you whichever country ID it randomly picked — even though c.country_id is the join criterion, so must be the same as r.country, which is NULL!

What you can do to work around the problem is:

  • group by printable_name instead; should be OK if printable_names are unique, or
  • select “r.country” as well as printable_name, and check that for being NULL, or
  • forget WITH ROLLUP and do a separate query for the end sum. This will be a little slower but it will also be ANSI SQL-92 compliant so your app could work on other databases.

(*: MySQL has an SQL_MODE option ONLY_FULL_GROUP_BY that is supposed to address this issue, but it goes much too far and only lets you select columns from the GROUP BY, not columns that have a functional dependency on the GROUP BY. So it will make valid queries fail as well, making it generally useless.)

like image 168
bobince Avatar answered Jan 17 '23 06:01

bobince