Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select and Group by together

Tags:

sql

group-by

db2

I have my query like this:

Select 
  a.abc,
  a.cde,
  a.efg,
  a.agh,
  c.dummy
  p.test
  max(b.this)
  sum(b.sugar)
  sum(b.bucket)
  sum(b.something)

followed by some outer join and inner join. Now the problem is when in group by

group by 
  a.abc,
  a.cde,
  a.efg,
  a.agh,
  c.dummy,
  p.test   

The query works fine. But if I remove any one of them from group by it gives:

SQLSTATE: 42803

Can anyone explain the cause of this error?

like image 520
Abhishek Bhandari Avatar asked Nov 14 '11 12:11

Abhishek Bhandari


2 Answers

Generally, any column that isn't in the group by section can only be included in the select section if it has an aggregating function applied to it. Or, another way, any non-aggregated data in the select section must be grouped on.

Otherewise, how do you know what you want done with it. For example, if you group on a.abc, there can only be one thing that a.abc can be for that grouped row (since all other values of a.abc will come out in a different row). Here's a short example, with a table containing:

LastName  FirstName  Salary
--------  ---------  ------
Smith     John       123456
Smith     George     111111
Diablo    Pax        999999

With the query select LastName, Salary from Employees group by LastName, you would expect to see:

LastName  Salary
--------  ------
Smith     ??????
Diablo    999999

The salary for the Smiths is incalculable since you don't know what function to apply to it, which is what's causing that error. In other words, the DBMS doesn't know what to do with 123456 and 111111 to get a single value for the grouped row.

If you instead used select LastName, sum(Salary) from Employees group by LastName (or max() or min() or ave() or any other aggregating function), the DBMS would know what to do. For sum(), it will simply add them and give you 234567.

In your query, the equivalent of trying to use Salary without an aggregating function is to change sum(b.this) to just b.this but not include it in the group by section. Or alternatively, remove one of the group by columns without changing it to an aggregation in the select section.

In both cases, you'll have one row that has multiple possible values for the column.

The DB2 docs at publib for sqlstate 42803 describe your problem:

A column reference in the SELECT or HAVING clause is invalid, because it is not a grouping column; or a column reference in the GROUP BY clause is invalid.

like image 119
paxdiablo Avatar answered Sep 30 '22 15:09

paxdiablo


SQL will insist that any column in the SELECT section is either included in the GROUP BY section or has an aggregate function applied to it in the SELECT section.

This article gives a nice explanation of why this is the case. The article is sql server specific but the principle should be roughly similar for all RDBMS

like image 42
ipr101 Avatar answered Sep 30 '22 14:09

ipr101