Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected results when using FIRST_VALUE() in SQL Server 2012

Tags:

When I use FIRST_VALUE on a data set that I construct by hand I get one result, and when I use it on a data set that results from a left join, I get a different result - even though the data sets appear to me to contain the exact same data values. I've reproduced the issue with a simple data set below.

Can someone tell me if I have misunderstood something?

This SQL produces the expected result, that FIRST_VALUE is NULL and LAST_VALUE is 30.

SELECT   agroup,   aval,   FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv,   LAST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv FROM (   SELECT 1 agroup, 10 aval   UNION ALL SELECT 1, NULL   UNION ALL SELECT 1, 30 ) T 

This SQL uses a LEFT JOIN that results in the same data set as above, but FIRST_VALUE appears to ignore the NULL.

SELECT    agroup,   aval,   FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv,   LAST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv FROM (   SELECT      T1.agroup,     T1.akey,     T2.aval    FROM    (     SELECT 1 agroup, 1 akey     UNION ALL SELECT 1, 2     UNION ALL SELECT 1, 3   ) T1   LEFT JOIN   (     SELECT 1 akey, 10 aval     UNION ALL SELECT 3,30   ) T2 ON T1.akey = T2.akey ) T 

I can also show that the left join behavior is different when using a table variable vs. a CTE. When using a CTE to generate the data, FIRST_VALUE ignores the NULL. Using the exact same SQL but putting the results in a table variable or a temporary table results in the NULL being taken into account.

With a CTE the SQL Server results don't include NULL in the FIRST_VALUE determination:

WITH T AS (   SELECT      T1.agroup,     T1.akey,     T2.aval    FROM    (     SELECT 1 agroup, 1 akey     UNION ALL SELECT 1, 2     UNION ALL SELECT 1, 3   ) T1   LEFT JOIN   (     SELECT 1 akey, 10 aval     UNION ALL SELECT 3,30   ) T2 ON T1.akey = T2.akey )  SELECT    agroup,   aval,   FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv,   LAST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv FROM  T 

But with a table variable, it does:

DECLARE @T TABLE (agroup INT,akey INT,aval INT)  INSERT INTO   @T SELECT    T1.agroup,   T1.akey,   T2.aval  FROM  (   SELECT 1 agroup, 1 akey   UNION ALL SELECT 1, 2   UNION ALL SELECT 1, 3 ) T1 LEFT JOIN (   SELECT 1 akey, 10 aval   UNION ALL SELECT 3,30 ) T2 ON T1.akey = T2.akey   SELECT  agroup, aval, FIRST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) fv, LAST_VALUE(aval) OVER (PARTITION BY agroup ORDER BY aval ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv FROM @T 
like image 726
Salah at Soap Hope Avatar asked Sep 11 '13 22:09

Salah at Soap Hope


1 Answers

The provided examples show very clearly that there is an inconsistency in the implementation of the FIRST_VALUE() analytic function.

Depending on whether the underlying table in the FROM clause is a base table (or temporary or a table variable or even a derived table created on the fly) in one case and a derived table (or cte) created by the LEFT JOIN of two created on the fly tables in the second case, the results are different. Seems like the NULL values are ignored in the 2nd case or treated as high values.

And they shouldn't be different, because the result of a SQL query should not depend on how the FROM clause gets the values of the table it provides to the SELECT clause and also because documentation of the OVER clause clearly states how NULL values should be treated:

order_by_expression

Specifies a column or expression on which to sort. order_by_expression can only refer to columns made available by the FROM clause. An integer cannot be specified to represent a column name or alias.

...

ASC | DESC

Specifies that the values in the specified column should be sorted in ascending or descending order. ASC is the default sort order. Null values are treated as the lowest possible values.

So, the correct results - according to the SQL-Server documentation - are the ones that do not ignore NULL values. Any other result should not happen and since it does happen, it is a bug.

I suggested you test in the most recent version (and not only in the RTM) as it may have been identified and corrected in some service pack or update and if it's still there (or if you don't have a newrer version available) to submit this as a bug in the Connect site.


Update

For future reference, the bug was submitted by the OP. The link is: Connect item and (our) @Aaron Bertrand has commented there that it also appears in most current SQL 2014 builds.

like image 150
ypercubeᵀᴹ Avatar answered Sep 17 '22 08:09

ypercubeᵀᴹ