I have two queries. One of them makes sense to me, the other don't. First one: <pre class="prettyprint"><code>SELECT gender AS 'Gender', count(*) AS '#' FROM registrations GROUP BY gender WITH ROLLUP </code></pre> That gives me this: <pre class="prettyprint"><code>Gender # Female 20 Male 19 NULL 39 </code></pre> So, I get the count, and the total count. What I expected. Next one: <pre class="prettyprint"><code>SELECT c.printable_name AS 'Country', count(*) AS '#' FROM registrations r INNER JOIN country c ON r.country = c.country_id GROUP BY country WITH ROLLUP Country # Denmark 9 Norway 10 Sweden 18 United States 1 Uzbekistan 1 Uzbekistan 39 </code></pre> Same result. But why do I get Uzbekistan for the total??

<blockquote> But why do I get Uzbekistan for the total?? </blockquote> Because you're not SELECTing the item that you're GROUPing BY. If you said: <pre class="prettyprint"><code>GROUP BY c.printable_name </code></pre> You'd get the expected NULL. However you're grouping by a different column so MySQL doesn't know that printable_name is taking part in a rollup-group, and selects any old value from that column, in the join of all registrations. (So it is possible you will see other countries than Uzbekistan.) This is part of a wider problem with MySQL being permissive on what you can SELECT in a GROUP BY query. For example, you can say: <pre class="prettyprint"><code>SELECT gender FROM registrations GROUP BY country; </code></pre> and MySQL will happily pick one of the gender values for a registration from each country, even though there is no direct causal link (aka “functional dependency”) between country and gender. Other DBMSs will refuse the above command on the grounds that there isn't guaranteed to be one gender per country.(*) Now, this: <pre class="prettyprint"><code>SELECT c.printable_name AS 'Country', count(*) AS '#' FROM registrations r INNER JOIN country c ON r.country = c.country_id GROUP BY country </code></pre> is OK, because there's a functional dependency between r.country and c.printable_name (assuming you have correctly described your country_id as a PRIMARY KEY). However MySQL's WITH ROLLUP extension is a bit of a hack in the way it works. On the rollup row stage at the end, it runs over the entire pre-grouping result set to grab its values, and then sets the group-by column to NULL. It doesn't also null other columns that have a functional dependency on that column. It probably should, but MySQL currently doesn't really understand the whole thing about functional dependencies. So if you select c.printable_name it will show you whichever country name value it randomly picked, and if you select c.country_id it will show you whichever country ID it randomly picked — even though c.country_id is the join criterion, so must be the same as r.country, which is NULL! What you can do to work around the problem is: <ul> <li>group by printable_name instead; should be OK if printable_names are unique, or</li> <li>select “r.country” as well as printable_name, and check that for being NULL, or</li> <li>forget WITH ROLLUP and do a separate query for the end sum. This will be a little slower but it will also be ANSI SQL-92 compliant so your app could work on other databases.</li> </ul> (*: MySQL has an SQL_MODE option ONLY_FULL_GROUP_BY that is supposed to address this issue, but it goes much too far and only lets you select columns from the GROUP BY, not columns that have a functional dependency on the GROUP BY. So it will make valid queries fail as well, making it generally useless.)

MySQL: Total GROUP BY WITH ROLLUP curiosity

Tags:

I have two queries. One of them makes sense to me, the other don't. First one:

Click to copy

SELECT gender AS 'Gender', count(*) AS '#'     FROM registrations      GROUP BY gender WITH ROLLUP

That gives me this:

Click to copy

Gender       # Female      20 Male        19 NULL        39

So, I get the count, and the total count. What I expected. Next one:

Click to copy

SELECT c.printable_name AS 'Country', count(*) AS '#'      FROM registrations r      INNER JOIN country c ON r.country = c.country_id      GROUP BY country WITH ROLLUP  Country         # Denmark         9 Norway         10 Sweden         18 United States   1 Uzbekistan      1 Uzbekistan     39

Same result. But why do I get Uzbekistan for the total??

590

asked Mar 18 '09 19:03

Svish

1 Answers

But why do I get Uzbekistan for the total??

Because you're not SELECTing the item that you're GROUPing BY. If you said:

Click to copy

GROUP BY c.printable_name

You'd get the expected NULL. However you're grouping by a different column so MySQL doesn't know that printable_name is taking part in a rollup-group, and selects any old value from that column, in the join of all registrations. (So it is possible you will see other countries than Uzbekistan.)

This is part of a wider problem with MySQL being permissive on what you can SELECT in a GROUP BY query. For example, you can say:

Click to copy

SELECT gender FROM registrations GROUP BY country;

and MySQL will happily pick one of the gender values for a registration from each country, even though there is no direct causal link (aka “functional dependency”) between country and gender. Other DBMSs will refuse the above command on the grounds that there isn't guaranteed to be one gender per country.(*)

Now, this:

Click to copy

SELECT c.printable_name AS 'Country', count(*) AS '#'  FROM registrations r  INNER JOIN country c ON r.country = c.country_id  GROUP BY country

is OK, because there's a functional dependency between r.country and c.printable_name (assuming you have correctly described your country_id as a PRIMARY KEY).

However MySQL's WITH ROLLUP extension is a bit of a hack in the way it works. On the rollup row stage at the end, it runs over the entire pre-grouping result set to grab its values, and then sets the group-by column to NULL. It doesn't also null other columns that have a functional dependency on that column. It probably should, but MySQL currently doesn't really understand the whole thing about functional dependencies.

So if you select c.printable_name it will show you whichever country name value it randomly picked, and if you select c.country_id it will show you whichever country ID it randomly picked — even though c.country_id is the join criterion, so must be the same as r.country, which is NULL!

What you can do to work around the problem is:

group by printable_name instead; should be OK if printable_names are unique, or
select “r.country” as well as printable_name, and check that for being NULL, or
forget WITH ROLLUP and do a separate query for the end sum. This will be a little slower but it will also be ANSI SQL-92 compliant so your app could work on other databases.

(*: MySQL has an SQL_MODE option ONLY_FULL_GROUP_BY that is supposed to address this issue, but it goes much too far and only lets you select columns from the GROUP BY, not columns that have a functional dependency on the GROUP BY. So it will make valid queries fail as well, making it generally useless.)

168

answered Jan 17 '23 06:01

bobince

Related questions
                            
                                Ghostscript command line parameters to convert EPS to PDF
                            
                                UIPickerView - 1st row selection does not call didSelectRow
                            
                                Is it safe to use a subversion feature branch after reintegrate-merged to trunk?
                            
                                How do I prevent hotlinking on Amazon S3 without using signed URLs?
                            
                                Defining and using protocols in objective-c
                            
                                method with same name and different parameters(Method Overloading) in Ruby
                            
                                C# Multiple generic constraints
                            
                                Really simple short string compression
                            
                                Difference between Strategy pattern and Delegation pattern
                            
                                How to find all files a particular user has ever changed in Team Foundation Server
                            
                                Xcode - Using #pragma mark
                            
                                How do you add context senstive menu to NSOutlineView (ie right click menu)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MySQL: Total GROUP BY WITH ROLLUP curiosity

Tags:

Svish

People also ask

1 Answers

bobince

Recent Activity

Donate For Us