I have just learnt how to Pivot in SQL Server. I was wondering why the max
function is used when we want to pivot text columns? What's the logic behind this? I understand if it's Count
, Sum
etc (because your summing that respective row and column) but I don't understand the logic of using max
when we have text columns?
For example, my code is:
SELECT *
FROM ( SELECT DATE
,SITA
,EVENT
FROM [UKRMC].[dbo].[strategy]
where datename(year, DATE) = 2018 or datename(year,DATE)=2019
) strategy
PIVOT ( max(EVENT)
FOR SITA IN ([ABZPD],[BFSPD]
,[BFSZH]
,[BHXPD]
,[BHXZH]
,[BRSZH]
,[BRUPQ] ) piv
The SQL MIN() and MAX() Functions The MIN() function returns the smallest value of the selected column. The MAX() function returns the largest value of the selected column.
PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output. And PIVOT runs aggregations where they're required on any remaining column values that are wanted in the final output.
Expression made up of a single constant, variable, scalar function, or column name or any combination of arithmetic, bitwise, and string operators. MAX can be used with numeric, character, and datetime columns, but not with bit columns.
Because in your example you've chosen EVENT
as the value to show in the PIVOT intersections (i.e. since you've specified EVENT
in the PIVOT
clause), the value must be specified with one of the permissible aggregate functions, as there are potentially multiple rows for each of the column values that you've chosen in your pivot, when grouped by the remaining columns (i.e. DATE in your case).
In Sql Server[1], MAX()
or MIN()
is commonly used when pivoting non-numeric columns, as it is able to show one of the original of the values of the column.
Any non-aggregate and non-pivoted columns will be left as-is and will be used to form the groups on which the pivot is based (in your case, column DATE
isn't either in the aggregate, or the column pivot, so it will form the row group)
Consider the case where your pivoted table contains multiple rows matching your predicate, such as this:
INSERT INTO strategy (DATE, SITA, EVENT) VALUES
('1 Jan 2018', 'ABZPD', 'Event1'),
('1 Jan 2018', 'BFSPD', 'Event2'),
('1 Jan 2018', 'BFSPD', 'Event3');
After Pivot:
DATE ABZPD BFSPD
2018-01-01T00:00:00Z Event1 Event3
i.e. During the Pivot, the BFSPD
rows for Event2
and Event3
needed to somehow be projected into a single cell - hence the need for an aggregate. This aggregate is still needed, even if there is known to be just one value (this being the case for the Event1
value for SITA ABZPD
in the above example).
Since BFSPD
has two events, you'll need to somehow resolve how to project a value into a single cell value. The use of MAX
on the VARCHAR column resolves the 'largest' value (Event3
) in the event of multiple rows in projecting into the same resulting pivot 'cell' - SqlFiddle example here
You could choose to use COUNT(Event)
to show you the number of events per row / pivot intersection - Fiddle
And you could switch the aggregate on EVENT
with DATE
- EVENT
is thus used in the column grouping.
*1 Aggregates like AVG
or STDEV
are obviously not available to strings. Other RDBMS have additional aggregates like FIRST which will arbitrarily take the first value, or GROUP_CONCAT / LIST_AGG, which can fold string values together with a delimiter. And PostGres allows you to make your own aggregate functions!. But sadly, none of this in SqlServer, hence MIN()
/ MAX() for now.
An aggregation function must be specified when using PIVOT
command because the first step of the pivoting operation is a grouping operation on the column specified in the FOR
clause that reduces the number of lines of the resulting tables.
The aggregation function is used to manage values for the other columns that are required in the output table.
From Technet documentation:
PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output.
Here is the PIVOT
command syntax taken from the same Technet article:
SELECT <non-pivoted column>,
[first pivoted column] AS <column name>,
[second pivoted column] AS <column name>,
...
[last pivoted column] AS <column name>
FROM
(<SELECT query that produces the data>)
AS <alias for the source query>
PIVOT
(
<aggregation function>(<column being aggregated>)
FOR
[<column that contains the values that will become column headers>]
IN ( [first pivoted column], [second pivoted column],
... [last pivoted column])
) AS <alias for the pivot table>
<optional ORDER BY clause>;
Please note that after the PIVOT
clause you must specify an aggregation function:
...
<aggregation function>(<column being aggregated>)
...
For an additional insight on this topic see also this Microsoft Press article.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With