Why does adding columns to group by increases rows returned

Question

Why would adding columns to the select list and group by in query increase the number of rows returned? For instance, below is my basic query, but if I add columns, I get more rows returned. The more columns added, the more rows returned. It seems the where clause conditions should be the only thing controlling the rows that are returned.

Example Query:

select pobj_name, max(prev) prev 
from pfmc_part 
where pmodel in ('PN-DWG', 'NO-DWG') and pstatus = 'RELEASED' 
group by pobj_name 
order by pobj_name

Example Query:

select pobj_name, max(prev) prev 
from pfmc_part 
where pmodel in ('PN-DWG', 'NO-DWG') and pstatus = 'RELEASED' 
group by pobj_name 
order by pobj_name

Michael Berkowski · Accepted Answer

When you add a WHERE clause, it filters the set of rows which will be handed down next to the GROUP BY, so WHERE is applied before groups. The rows which will be grouped and on which aggregates like SUM(),MAX(),MIN(),COUNT() will be performed have already been limited to those matching the WHERE conditions before GROUP BY is applied.

As to why you get more rows when adding columns into SELECT and GROUP BY -- well, that's how GROUP BY aggregates work. If you add additional columns into SELECT you must also group them (in most RDBMS) and as long as the values differ across rows, they will result in more grouped rows.

Consider this table:

Name   Score
John   2
Bill   3
John   1

A single GROUP BY on Name will collapse the John rows into one:

SELECT Name, SUM(Score) AS total
FROM scores
GROUP BY name

Name   total
John   3
Bill   3

Now consider this table, which has another column for sport. Here, John has 2 different sports represented while Bill has only 1. To include both Name, Sport in the SELECT list they must both also be in the GROUP BY. The similar values across rows collapse into groups, but now John has two sets of similar values to group:

Name  Sport    Score
John  Baseball  3
John  Bowling   9
Bill  Baseball  10
Bill  Baseball  6
John  Bowling   12

SELECT Name, Sport, SUM(Score) AS total
FROM scores
GROUP BY Name, Sport

Name  Sport    total
John  Baseball 3
John  Bowling  21
Bill  Baseball 16

So adding additional columns to the GROUP BY will result in more output rows if the columns have dissimilar values across rows.

Applying a WHERE clause to this second table to look for only John rows, would eliminate all Bill rows before applying groups. The result would be two rows.

SELECT Name, Sport, SUM(Score) AS total
FROM scores
-- Filter with a WHERE clause this time
WHERE Name = 'John'
GROUP BY Name, Sport

-- Results in only John's rows
Name  Sport    Score
John  Baseball 3
John  Bowling  21

Bacon Bits · Answer

Consider the table StudentScores:

+-----------+------------+-------+
| Last_Name | First_Name | Score |
+-----------+------------+-------+
| Smith     | Bob        |    90 |
| Smith     | Bob        |    85 |
| Smith     | Alice      |   100 |
| Smith     | Alice      |    90 |
| Johnson   | Evelyn     |    80 |
| Johnson   | Evelyn     |    85 |
+-----------+------------+-------+

This query:

SELECT Last_Name, MAX(Score) Score
FROM StudentScores
GROUP BY Last_Name

Results:

+-----------+-------+
| Last_Name | Score |
+-----------+-------+
| Smith     |   100 |
| Johnson   |    85 |
+-----------+-------+

However, this query:

SELECT Last_Name, MAX(Score) Score
FROM StudentScores
GROUP BY Last_Name, First_Name

Results:

+-----------+-------+
| Last_Name | Score |
+-----------+-------+
| Smith     |    90 |
| Smith     |   100 |
| Johnson   |    85 |
+-----------+-------+

So what's different? Well, 'Smith' is a Last_Name of two students:

SELECT Last_Name, First_Name, MAX(Score) Score
FROM StudentScores
GROUP BY Last_Name, First_Name

Results:

+-----------+------------+-------+
| Last_Name | First_Name | Score |
+-----------+------------+-------+
| Smith     | Bob        |    90 |
| Smith     | Alice      |   100 |
| Johnson   | Evelyn     |    85 |
+-----------+------------+-------+

Why does adding columns to group by increases rows returned

Tags:

sql

sql-server

wallenpb

2 Answers

Michael Berkowski

Bacon Bits

Recent Activity

Donate For Us

Why does adding columns to group by increases rows returned

Tags:

sql

sql-server

wallenpb

2 Answers

Michael Berkowski

Bacon Bits

Related questions

Recent Activity

Donate For Us