Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the Max function used when we pivot text columns in SQL Server?

I have just learnt how to Pivot in SQL Server. I was wondering why the max function is used when we want to pivot text columns? What's the logic behind this? I understand if it's Count, Sum etc (because your summing that respective row and column) but I don't understand the logic of using max when we have text columns?

For example, my code is:

SELECT * 
  FROM ( SELECT DATE
               ,SITA
               ,EVENT 
          FROM  [UKRMC].[dbo].[strategy] 
          where datename(year, DATE) = 2018 or datename(year,DATE)=2019
        ) strategy
  PIVOT ( max(EVENT)
          FOR SITA IN ([ABZPD],[BFSPD]
,[BFSZH]
,[BHXPD]
,[BHXZH]
,[BRSZH]
,[BRUPQ] ) piv
like image 371
Sorath Avatar asked Jan 03 '18 13:01

Sorath


People also ask

Why do we use Max in SQL?

The SQL MIN() and MAX() Functions The MIN() function returns the smallest value of the selected column. The MAX() function returns the largest value of the selected column.

How does PIVOT function work in SQL?

PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output. And PIVOT runs aggregations where they're required on any remaining column values that are wanted in the final output.

Can we use Max on a column?

Expression made up of a single constant, variable, scalar function, or column name or any combination of arithmetic, bitwise, and string operators. MAX can be used with numeric, character, and datetime columns, but not with bit columns.


2 Answers

Because in your example you've chosen EVENT as the value to show in the PIVOT intersections (i.e. since you've specified EVENT in the PIVOT clause), the value must be specified with one of the permissible aggregate functions, as there are potentially multiple rows for each of the column values that you've chosen in your pivot, when grouped by the remaining columns (i.e. DATE in your case).

In Sql Server[1], MAX() or MIN() is commonly used when pivoting non-numeric columns, as it is able to show one of the original of the values of the column.

Any non-aggregate and non-pivoted columns will be left as-is and will be used to form the groups on which the pivot is based (in your case, column DATE isn't either in the aggregate, or the column pivot, so it will form the row group)

Consider the case where your pivoted table contains multiple rows matching your predicate, such as this:

INSERT INTO strategy (DATE, SITA, EVENT) VALUES
('1 Jan 2018', 'ABZPD', 'Event1'),
('1 Jan 2018', 'BFSPD', 'Event2'),
('1 Jan 2018', 'BFSPD', 'Event3');

After Pivot:

DATE                    ABZPD   BFSPD
2018-01-01T00:00:00Z    Event1  Event3

i.e. During the Pivot, the BFSPD rows for Event2 and Event3 needed to somehow be projected into a single cell - hence the need for an aggregate. This aggregate is still needed, even if there is known to be just one value (this being the case for the Event1 value for SITA ABZPD in the above example).

Since BFSPD has two events, you'll need to somehow resolve how to project a value into a single cell value. The use of MAX on the VARCHAR column resolves the 'largest' value (Event3) in the event of multiple rows in projecting into the same resulting pivot 'cell' - SqlFiddle example here

You could choose to use COUNT(Event) to show you the number of events per row / pivot intersection - Fiddle

And you could switch the aggregate on EVENT with DATE - EVENT is thus used in the column grouping.


*1 Aggregates like AVG or STDEV are obviously not available to strings. Other RDBMS have additional aggregates like FIRST which will arbitrarily take the first value, or GROUP_CONCAT / LIST_AGG, which can fold string values together with a delimiter. And PostGres allows you to make your own aggregate functions!. But sadly, none of this in SqlServer, hence MIN() / MAX() for now.

like image 153
StuartLC Avatar answered Oct 12 '22 02:10

StuartLC


An aggregation function must be specified when using PIVOT command because the first step of the pivoting operation is a grouping operation on the column specified in the FOR clause that reduces the number of lines of the resulting tables.

The aggregation function is used to manage values for the other columns that are required in the output table.

From Technet documentation:

PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output.

Here is the PIVOT command syntax taken from the same Technet article:

SELECT <non-pivoted column>,  
    [first pivoted column] AS <column name>,  
    [second pivoted column] AS <column name>,  
    ...  
    [last pivoted column] AS <column name>  
FROM  
    (<SELECT query that produces the data>)   
    AS <alias for the source query>  
PIVOT  
(  
    <aggregation function>(<column being aggregated>)  
FOR   
[<column that contains the values that will become column headers>]   
    IN ( [first pivoted column], [second pivoted column],  
    ... [last pivoted column])  
) AS <alias for the pivot table>  
<optional ORDER BY clause>;  

Please note that after the PIVOT clause you must specify an aggregation function:

...
<aggregation function>(<column being aggregated>)
...

For an additional insight on this topic see also this Microsoft Press article.

like image 4
Andrea Avatar answered Oct 12 '22 03:10

Andrea