Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: Last_Value() returns wrong result (but First_Value() works fine)

I have a table in SQL Server 2012 as the snapshot shows:

enter image description here

Then I'm using Last_Value() and First Value to get AverageAmount of each EmpID for different YearMonth. The script is as follows:

SELECT A.EmpID,  
       First_Value(A.AverageAmount) OVER (PARTITION BY A.EmpID Order by A.DimYearMonthKey asc) AS  '200901AvgAmount', 
       Last_Value(A.AverageAmount) OVER (PARTITION BY A.EmpID Order by A.DimYearMonthKey asc) AS '201112AvgAmount'

FROM  Emp_Amt  AS A

However, the result for this query is:

result

In column of "201112AvgAmount", it shows different values for each EmpID while "200901AvgAmount" has correct values.

Is there anything wrong with my SQL script? I did a lot research online but still cannot find the answer....

like image 676
Echo Avatar asked Mar 13 '13 15:03

Echo


2 Answers

Here's a quick query to illustrate the behaviour:

select 
  v,

  -- FIRST_VALUE() and LAST_VALUE()
  first_value(v) over(order by v) f1,
  first_value(v) over(order by v rows between unbounded preceding and current row) f2,
  first_value(v) over(order by v rows between unbounded preceding and unbounded following) f3,
  last_value (v) over(order by v) l1,
  last_value (v) over(order by v rows between unbounded preceding and current row) l2,
  last_value (v) over(order by v rows between unbounded preceding and unbounded following) l3,

  -- For completeness' sake, let's also compare the above with MAX()
  max        (v) over() m1,
  max        (v) over(order by v) m2,
  max        (v) over(order by v rows between unbounded preceding and current row) m3,
  max        (v) over(order by v rows between unbounded preceding and unbounded following) m4
from (values(1),(2),(3),(4)) t(v)

The output of the above query can be seen here (SQLFiddle here):

| V | F1 | F2 | F3 | L1 | L2 | L3 | M1 | M2 | M3 | M4 |
|---|----|----|----|----|----|----|----|----|----|----|
| 1 |  1 |  1 |  1 |  1 |  1 |  4 |  4 |  1 |  1 |  4 |
| 2 |  1 |  1 |  1 |  2 |  2 |  4 |  4 |  2 |  2 |  4 |
| 3 |  1 |  1 |  1 |  3 |  3 |  4 |  4 |  3 |  3 |  4 |
| 4 |  1 |  1 |  1 |  4 |  4 |  4 |  4 |  4 |  4 |  4 |

Few people think of the implicit frames that are applied to window functions that take an ORDER BY clause. In this case, windows are defaulting to the frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. (RANGE is not exactly the same as ROWS, but that's another story). Think about it this way:

  • On the row with v = 1 the ordered window's frame spans v IN (1)
  • On the row with v = 2 the ordered window's frame spans v IN (1, 2)
  • On the row with v = 3 the ordered window's frame spans v IN (1, 2, 3)
  • On the row with v = 4 the ordered window's frame spans v IN (1, 2, 3, 4)

If you want to prevent that behaviour, you have two options:

  • Use an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause for ordered window functions
  • Use no ORDER BY clause in those window functions that allow for omitting them (as MAX(v) OVER())

More details are explained in this article about LEAD(), LAG(), FIRST_VALUE() and LAST_VALUE()

like image 194
Lukas Eder Avatar answered Nov 01 '22 15:11

Lukas Eder


There is nothing wrong with your script, this is a way how partitioning works in SQL server :/. If you change LAST_VALUE to MAX result will be the same. Solution would be:

SELECT A.EmpID,  
       First_Value(A.AverageAmount) OVER (PARTITION BY A.EmpID Order by A.DimYearMonthKey asc) AS  '200901AvgAmount', 
       Last_Value(A.AverageAmount) OVER (PARTITION BY A.EmpID Order by A.DimYearMonthKey ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS '201112AvgAmount'  
FROM  Emp_Amt  AS A

There is a great post about it, link. GL!

like image 42
www Avatar answered Nov 01 '22 15:11

www