I have just learnt how to Pivot in SQL Server. I was wondering why the <code>max</code> function is used when we want to pivot text columns? What's the logic behind this? I understand if it's <code>Count</code>, <code>Sum</code> etc (because your summing that respective row and column) but I don't understand the logic of using <code>max</code> when we have text columns? For example, my code is: <pre class="prettyprint"><code>SELECT * FROM ( SELECT DATE ,SITA ,EVENT FROM [UKRMC].[dbo].[strategy] where datename(year, DATE) = 2018 or datename(year,DATE)=2019 ) strategy PIVOT ( max(EVENT) FOR SITA IN ([ABZPD],[BFSPD] ,[BFSZH] ,[BHXPD] ,[BHXZH] ,[BRSZH] ,[BRUPQ] ) piv </code></pre>

Because in your example you've chosen <code>EVENT</code> as the value to show in the PIVOT intersections (i.e. since you've specified <code>EVENT</code> in the <code>PIVOT</code> clause), the value must be specified with one of the permissible aggregate functions, as there are potentially multiple rows for each of the column values that you've chosen in your pivot, when grouped by the remaining columns (i.e. DATE in your case). In Sql Server[1], <code>MAX()</code> or <code>MIN()</code> is commonly used when pivoting non-numeric columns, as it is able to show one of the original of the values of the column. Any non-aggregate and non-pivoted columns will be left as-is and will be used to form the groups on which the pivot is based (in your case, column <code>DATE</code> isn't either in the aggregate, or the column pivot, so it will form the row group) Consider the case where your pivoted table contains multiple rows matching your predicate, such as this: <pre class="prettyprint"><code>INSERT INTO strategy (DATE, SITA, EVENT) VALUES ('1 Jan 2018', 'ABZPD', 'Event1'), ('1 Jan 2018', 'BFSPD', 'Event2'), ('1 Jan 2018', 'BFSPD', 'Event3'); </code></pre> After Pivot: <pre class="prettyprint"><code>DATE ABZPD BFSPD 2018-01-01T00:00:00Z Event1 Event3 </code></pre> i.e. During the Pivot, the <code>BFSPD</code> rows for <code>Event2</code> and <code>Event3</code> needed to somehow be projected into a single cell - hence the need for an aggregate. This aggregate is still needed, even if there is known to be just one value (this being the case for the <code>Event1</code> value for SITA <code>ABZPD</code> in the above example). Since <code>BFSPD</code> has two events, you'll need to somehow resolve how to project a value into a single cell value. The use of <code>MAX</code> on the VARCHAR column resolves the 'largest' value (<code>Event3</code>) in the event of multiple rows in projecting into the same resulting pivot 'cell' - SqlFiddle example here You could choose to use <code>COUNT(Event)</code> to show you the number of events per row / pivot intersection - Fiddle And you could switch the aggregate on <code>EVENT</code> with <code>DATE</code> - <code>EVENT</code> is thus used in the column grouping. <hr> *1 Aggregates like <code>AVG</code> or <code>STDEV</code> are obviously not available to strings. Other RDBMS have additional aggregates like FIRST which will arbitrarily take the first value, or GROUP_CONCAT / LIST_AGG, which can fold string values together with a delimiter. And PostGres allows you to make your own aggregate functions!. But sadly, none of this in SqlServer, hence MIN() / MAX() for now.

An aggregation function must be specified when using <code>PIVOT</code> command because the first step of the pivoting operation is a grouping operation on the column specified in the <code>FOR</code> clause that reduces the number of lines of the resulting tables. The aggregation function is used to manage values for the other columns that are required in the output table. From Technet documentation: <blockquote> PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output. </blockquote> Here is the <code>PIVOT</code> command syntax taken from the same Technet article: <pre class="prettyprint"><code>SELECT <non-pivoted column>, [first pivoted column] AS <column name>, [second pivoted column] AS <column name>, ... [last pivoted column] AS <column name> FROM (<SELECT query that produces the data>) AS <alias for the source query> PIVOT ( <aggregation function>(<column being aggregated>) FOR [<column that contains the values that will become column headers>] IN ( [first pivoted column], [second pivoted column], ... [last pivoted column]) ) AS <alias for the pivot table> <optional ORDER BY clause>; </code></pre> Please note that after the <code>PIVOT</code> clause you must specify an aggregation function: <pre class="prettyprint"><code>... <aggregation function>(<column being aggregated>) ... </code></pre> For an additional insight on this topic see also this Microsoft Press article.

Why is the Max function used when we pivot text columns in SQL Server?

Tags:

sql

sql-server

tsql

pivot

I have just learnt how to Pivot in SQL Server. I was wondering why the max function is used when we want to pivot text columns? What's the logic behind this? I understand if it's Count, Sum etc (because your summing that respective row and column) but I don't understand the logic of using max when we have text columns?

For example, my code is:

SELECT * 
  FROM ( SELECT DATE
               ,SITA
               ,EVENT 
          FROM  [UKRMC].[dbo].[strategy] 
          where datename(year, DATE) = 2018 or datename(year,DATE)=2019
        ) strategy
  PIVOT ( max(EVENT)
          FOR SITA IN ([ABZPD],[BFSPD]
,[BFSZH]
,[BHXPD]
,[BHXZH]
,[BRSZH]
,[BRUPQ] ) piv

371

asked Jan 03 '18 13:01

Sorath

2 Answers

Because in your example you've chosen EVENT as the value to show in the PIVOT intersections (i.e. since you've specified EVENT in the PIVOT clause), the value must be specified with one of the permissible aggregate functions, as there are potentially multiple rows for each of the column values that you've chosen in your pivot, when grouped by the remaining columns (i.e. DATE in your case).

In Sql Server^[1], MAX() or MIN() is commonly used when pivoting non-numeric columns, as it is able to show one of the original of the values of the column.

Any non-aggregate and non-pivoted columns will be left as-is and will be used to form the groups on which the pivot is based (in your case, column DATE isn't either in the aggregate, or the column pivot, so it will form the row group)

Consider the case where your pivoted table contains multiple rows matching your predicate, such as this:

INSERT INTO strategy (DATE, SITA, EVENT) VALUES
('1 Jan 2018', 'ABZPD', 'Event1'),
('1 Jan 2018', 'BFSPD', 'Event2'),
('1 Jan 2018', 'BFSPD', 'Event3');

After Pivot:

DATE                    ABZPD   BFSPD
2018-01-01T00:00:00Z    Event1  Event3

i.e. During the Pivot, the BFSPD rows for Event2 and Event3 needed to somehow be projected into a single cell - hence the need for an aggregate. This aggregate is still needed, even if there is known to be just one value (this being the case for the Event1 value for SITA ABZPD in the above example).

Since BFSPD has two events, you'll need to somehow resolve how to project a value into a single cell value. The use of MAX on the VARCHAR column resolves the 'largest' value (Event3) in the event of multiple rows in projecting into the same resulting pivot 'cell' - SqlFiddle example here

You could choose to use COUNT(Event) to show you the number of events per row / pivot intersection - Fiddle

And you could switch the aggregate on EVENT with DATE - EVENT is thus used in the column grouping.

_{*1 Aggregates like AVG or STDEV are obviously not available to strings. Other RDBMS have additional aggregates like FIRST which will arbitrarily take the first value, or GROUP_CONCAT / LIST_AGG, which can fold string values together with a delimiter. And PostGres allows you to make your own aggregate functions!. But sadly, none of this in SqlServer, hence MIN()
/ MAX() for now.}

153

answered Oct 12 '22 02:10

StuartLC

An aggregation function must be specified when using PIVOT command because the first step of the pivoting operation is a grouping operation on the column specified in the FOR clause that reduces the number of lines of the resulting tables.

The aggregation function is used to manage values for the other columns that are required in the output table.

From Technet documentation:

PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output.

Here is the PIVOT command syntax taken from the same Technet article:

SELECT <non-pivoted column>,  
    [first pivoted column] AS <column name>,  
    [second pivoted column] AS <column name>,  
    ...  
    [last pivoted column] AS <column name>  
FROM  
    (<SELECT query that produces the data>)   
    AS <alias for the source query>  
PIVOT  
(  
    <aggregation function>(<column being aggregated>)  
FOR   
[<column that contains the values that will become column headers>]   
    IN ( [first pivoted column], [second pivoted column],  
    ... [last pivoted column])  
) AS <alias for the pivot table>  
<optional ORDER BY clause>;

Please note that after the PIVOT clause you must specify an aggregation function:

...
<aggregation function>(<column being aggregated>)
...

For an additional insight on this topic see also this Microsoft Press article.

answered Oct 12 '22 03:10

Andrea

Related questions
                            
                                SQL Error 17268 : Year out of range (Java/Spring)
                            
                                Is adding and dropping indexes everyday on huge tables a good practice?
                            
                                mysql UPDATE says column cannot be null. Why is it null?
                            
                                SQL silently converts int to varchar, but then throws an error when it comes across a varchar?
                            
                                Group records by consecutive dates when dates are not exactly consecutive
                            
                                Update with Dapper using dynamic column name
                            
                                Spring @Query with Lower and Wildcards
                            
                                Find out if a business is currently open in T-SQL
                            
                                Pyspark: cast array with nested struct to string
                            
                                How to get the 2 greatest values between multiple columns?
                            
                                How to check if at least one of a group of rows has a specific value
                            
                                How can I continue a transaction in Spring Boot with PostgreSQL after an Exception occured?
                            
                                SQL Deduplicate List of Tuples
                            
                                How to check are there JSON Functions by SQL query?
                            
                                How to store key value pairs in MySQL?
                            
                                Substituting value in empty field after using split_part
                            
                                Where is the postgres sql 'cast a tuple' idiom documented?
                            
                                Postgresql: Violates check constraint. Failing row contains
                            
                                Why do you need to include a field in GROUP BY when using OVER (PARTITION BY x)?
                            
                                Conditional JOIN based on column value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With