Why would adding columns to the select list and group by in query increase the number of rows returned? For instance, below is my basic query, but if I add columns, I get more rows returned. The more columns added, the more rows returned. It seems the where clause conditions should be the only thing controlling the rows that are returned.
Example Query:
select pobj_name, max(prev) prev
from pfmc_part
where pmodel in ('PN-DWG', 'NO-DWG') and pstatus = 'RELEASED'
group by pobj_name
order by pobj_name
When you add a WHERE
clause, it filters the set of rows which will be handed down next to the GROUP BY
, so WHERE
is applied before groups. The rows which will be grouped and on which aggregates like SUM(),MAX(),MIN(),COUNT()
will be performed have already been limited to those matching the WHERE
conditions before GROUP BY
is applied.
As to why you get more rows when adding columns into SELECT
and GROUP BY
-- well, that's how GROUP BY
aggregates work. If you add additional columns into SELECT
you must also group them (in most RDBMS) and as long as the values differ across rows, they will result in more grouped rows.
Consider this table:
Name Score
John 2
Bill 3
John 1
A single GROUP BY
on Name
will collapse the John
rows into one:
SELECT Name, SUM(Score) AS total
FROM scores
GROUP BY name
Name total
John 3
Bill 3
Now consider this table, which has another column for sport. Here, John
has 2 different sports represented while Bill has only 1. To include both Name, Sport
in the SELECT
list they must both also be in the GROUP BY
. The similar values across rows collapse into groups, but now John
has two sets of similar values to group:
Name Sport Score
John Baseball 3
John Bowling 9
Bill Baseball 10
Bill Baseball 6
John Bowling 12
SELECT Name, Sport, SUM(Score) AS total
FROM scores
GROUP BY Name, Sport
Name Sport total
John Baseball 3
John Bowling 21
Bill Baseball 16
So adding additional columns to the GROUP BY
will result in more output rows if the columns have dissimilar values across rows.
Applying a WHERE
clause to this second table to look for only John
rows, would eliminate all Bill
rows before applying groups. The result would be two rows.
SELECT Name, Sport, SUM(Score) AS total
FROM scores
-- Filter with a WHERE clause this time
WHERE Name = 'John'
GROUP BY Name, Sport
-- Results in only John's rows
Name Sport Score
John Baseball 3
John Bowling 21
Consider the table StudentScores:
+-----------+------------+-------+
| Last_Name | First_Name | Score |
+-----------+------------+-------+
| Smith | Bob | 90 |
| Smith | Bob | 85 |
| Smith | Alice | 100 |
| Smith | Alice | 90 |
| Johnson | Evelyn | 80 |
| Johnson | Evelyn | 85 |
+-----------+------------+-------+
This query:
SELECT Last_Name, MAX(Score) Score
FROM StudentScores
GROUP BY Last_Name
Results:
+-----------+-------+
| Last_Name | Score |
+-----------+-------+
| Smith | 100 |
| Johnson | 85 |
+-----------+-------+
However, this query:
SELECT Last_Name, MAX(Score) Score
FROM StudentScores
GROUP BY Last_Name, First_Name
Results:
+-----------+-------+
| Last_Name | Score |
+-----------+-------+
| Smith | 90 |
| Smith | 100 |
| Johnson | 85 |
+-----------+-------+
So what's different? Well, 'Smith'
is a Last_Name
of two students:
SELECT Last_Name, First_Name, MAX(Score) Score
FROM StudentScores
GROUP BY Last_Name, First_Name
Results:
+-----------+------------+-------+
| Last_Name | First_Name | Score |
+-----------+------------+-------+
| Smith | Bob | 90 |
| Smith | Alice | 100 |
| Johnson | Evelyn | 85 |
+-----------+------------+-------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With