Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why pivot with "extra" columns doesn't combine results

I know many of you have observed this behavior, but I'm wondering if anyone can explain why. When I create a small table to create an example of using the pivot function, I get the results I'd expect:

CREATE TABLE dbo.AverageFishLength
    (
      Fishtype VARCHAR(50) ,
      AvgLength DECIMAL(8, 2) ,
      FishAge_Years INT
    )
INSERT  INTO dbo.AverageFishLength
        ( Fishtype, AvgLength, FishAge_Years )
VALUES  ( 'Muskie', 32.75, 3 ),
        ( 'Muskie', 37.5, 4 ),
        ( 'Muskie', 39.75, 5 ),
        ( 'Walleye', 16.5, 3 ),
        ( 'Walleye', 18.25, 4 ),
        ( 'Walleye', 20.0, 5 ),
        ( 'Northern Pike', 20.75, 3 ),
        ( 'Northern Pike', 23.25, 4 ),
        ( 'Northern Pike', 26.0, 5 );

Here is the pivot query:

SELECT  Fishtype ,
        [3] AS [3 Years Old] ,
        [4] AS [4 Years Old] ,
        [5] AS [5 Years Old]
FROM    dbo.AverageFishLength   PIVOT( SUM(AvgLength) 
                                FOR FishAge_Years IN ( [3], [4], [5] ) ) AS PivotTbl

Here are the results:

enter image description here

However, if I create the table with an identity column, the results get separated into separate lines:

DROP TABLE dbo.AverageFishLength
CREATE TABLE dbo.AverageFishLength
    (
      ID INT IDENTITY(1,1) ,
      Fishtype VARCHAR(50) ,
      AvgLength DECIMAL(8, 2) ,
      FishAge_Years INT
    )
INSERT  INTO dbo.AverageFishLength
        ( Fishtype, AvgLength, FishAge_Years )
VALUES  ( 'Muskie', 32.75, 3 ),
        ( 'Muskie', 37.5, 4 ),
        ( 'Muskie', 39.75, 5 ),
        ( 'Walleye', 16.5, 3 ),
        ( 'Walleye', 18.25, 4 ),
        ( 'Walleye', 20.0, 5 ),
        ( 'Northern Pike', 20.75, 3 ),
        ( 'Northern Pike', 23.25, 4 ),
        ( 'Northern Pike', 26.0, 5 );

Same exact query:

SELECT  Fishtype ,
        [3] AS [3 Years Old] ,
        [4] AS [4 Years Old] ,
        [5] AS [5 Years Old]
FROM    dbo.AverageFishLength   PIVOT( SUM(AvgLength) 
                                FOR FishAge_Years IN ( [3], [4], [5] ) ) AS PivotTbl

Different results:

enter image description here

It appears to me that the ID column is being used in the query, even though it does not appear in the query at all. It's almost like it's implicitly included in the query, but not shown in the result set.

Can anyone explain why this occurs?

like image 741
Dave.Gugg Avatar asked Jan 08 '15 19:01

Dave.Gugg


1 Answers

It happens because the ID column is unique for each row and since you are querying the table directly (without a subquery) that column is included as part of the GROUP BY the aggregate function needs.

The docs the MSDN docs about FROM state the following:

table_source PIVOT <pivot_clause>

Specifies that the table_source is pivoted based on the pivot_column. table_source is a table or table expression. The output is a table that contains all columns of the table_source except the pivot_column and value_column. The columns of the table_source, except the pivot_column and value_column, are called the grouping columns of the pivot operator.

PIVOT performs a grouping operation on the input table with regard to the grouping columns and returns one row for each group. Additionally, the output contains one column for each value specified in the column_list that appears in the pivot_column of the input_table.

Your version is basically saying SELECT * FROM yourtable and PIVOT that data. Even though the ID column isn't in your final SELECT list, it is a grouping element on the query. If you compare the PIVOT to a "pre-PIVOT" example to show you'll see what your version. This example uses a CASE expression and an aggregate function:

SELECT Fishtype,
  sum(case when FishAge_Years = 3 then AvgLength else 0 end) as [3],
  sum(case when FishAge_Years = 4 then AvgLength else 0 end) as [4],
  sum(case when FishAge_Years = 5 then AvgLength else 0 end) as [5]
FROM dbo.AverageFishLength
GROUP BY Fishtype, ID;

The result will be skewed because even though you don't have the ID in the final list, it's still being used to group by and since they are unique, you get multiple rows.

The easiest way to solve this when using PIVOT is using a subquery:

SELECT Fishtype ,
        [3] AS [3 Years Old] ,
        [4] AS [4 Years Old] ,
        [5] AS [5 Years Old]
FROM
(
  SELECT Fishtype, 
    AvgLength, 
    FishAge_Years
  FROM    dbo.AverageFishLength
) d
PIVOT
( 
  SUM(AvgLength) 
  FOR FishAge_Years IN ( [3], [4], [5] ) 
) AS PivotTbl;

In this version you are only returning the columns that you actually need and want from your table - this excludes ID so it will not be used to group your data.

like image 64
Taryn Avatar answered Nov 10 '22 00:11

Taryn