Trouble using ROW_NUMBER() OVER (PARTITION BY ...)

Tags:

I'm using SQL Server 2008 R2. I have table called EmployeeHistory with the following structure and sample data:

EmployeeID Date      DepartmentID SupervisorID
10001      20130101  001          10009
10001      20130909  001          10019
10001      20131201  002          10018
10001      20140501  002          10017
10001      20141001  001          10015
10001      20141201  001          10014

Notice that the Employee 10001 has been changing 2 departments and several supervisors over time. What I am trying to do is to list the start and end dates of this employee's employment in each Department ordered by the Date field. So, the output will look like this:

EmployeeID DateStart DateEnd  DepartmentID 
10001      20130101  20131201 001
10001      20131201  20141001 002
10001      20141001  NULL     001

I intended to use partitioning the data using the following query but it failed. The Department changes from 001 to 002 and then back to 001. Obviously I cannot partition by DepartmentID... I'm sure I'm overlooking the obvious. Any help? Thank you, in advance.

SELECT * ,ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID
ORDER BY [Date]) RN FROM EmployeeHistory

998

asked Nov 12 '13 05:11

Thracian

2 Answers

I would do something like this:

;WITH x 
 AS (SELECT *, 
            Row_number() 
              OVER( 
                partition BY employeeid 
                ORDER BY datestart) rn 
     FROM   employeehistory) 
SELECT * 
FROM   x x1 
   LEFT OUTER JOIN x x2 
                ON x1.rn = x2.rn + 1

Or maybe it would be x2.rn - 1. You'll have to see. In any case, you get the idea. Once you have the table joined on itself, you can filter, group, sort, etc. to get what you need.

143

answered Oct 01 '22 13:10

Trevor

A bit involved. Easiest would be to refer to this SQL Fiddle I created for you that produces the exact result. There are ways you can improve it for performance or other considerations, but this should hopefully at least be clearer than some alternatives.

The gist is, you get a canonical ranking of your data first, then use that to segment the data into groups, then find an end date for each group, then eliminate any intermediate rows. ROW_NUMBER() and CROSS APPLY help a lot in doing it readably.

EDIT 2019:

The SQL Fiddle does in fact seem to be broken, for some reason, but it appears to be a problem on the SQL Fiddle site. Here's a complete version, tested just now on SQL Server 2016:

CREATE TABLE Source
(
  EmployeeID int,
  DateStarted date,
  DepartmentID int
)

INSERT INTO Source
VALUES
(10001,'2013-01-01',001),
(10001,'2013-09-09',001),
(10001,'2013-12-01',002),
(10001,'2014-05-01',002),
(10001,'2014-10-01',001),
(10001,'2014-12-01',001)


SELECT *, 
  ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS EntryRank,
  newid() as GroupKey,
  CAST(NULL AS date) AS EndDate
INTO #RankedData
FROM Source
;

UPDATE #RankedData
SET GroupKey = beginDate.GroupKey
FROM #RankedData sup
  CROSS APPLY 
  (
    SELECT TOP 1 GroupKey
    FROM #RankedData sub 
    WHERE sub.EmployeeID = sup.EmployeeID AND
      sub.DepartmentID = sup.DepartmentID AND
      NOT EXISTS 
        (
          SELECT * 
          FROM #RankedData bot 
          WHERE bot.EmployeeID = sup.EmployeeID AND
            bot.EntryRank BETWEEN sub.EntryRank AND sup.EntryRank AND
            bot.DepartmentID <> sup.DepartmentID
        )
      ORDER BY DateStarted ASC
    ) beginDate (GroupKey);

UPDATE #RankedData
SET EndDate = nextGroup.DateStarted
FROM #RankedData sup
  CROSS APPLY 
  (
    SELECT TOP 1 DateStarted
    FROM #RankedData sub
    WHERE sub.EmployeeID = sup.EmployeeID AND
      sub.DepartmentID <> sup.DepartmentID AND
      sub.EntryRank > sup.EntryRank
    ORDER BY EntryRank ASC
  ) nextGroup (DateStarted);

SELECT * FROM 
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY GroupKey ORDER BY EntryRank ASC) AS GroupRank FROM #RankedData
) FinalRanking
WHERE GroupRank = 1
ORDER BY EntryRank;

DROP TABLE #RankedData
DROP TABLE Source

answered Oct 01 '22 11:10

Dominic P

Related questions
                            
                                Cumulative measure using R embedded in SQL
                            
                                Interesting SQL Join on dates between dates
                            
                                SQL CONCAT - Funny characters - but I'm not laughing
                            
                                Does SQL Server support IS DISTINCT FROM clause?
                            
                                Calling Python from Oracle
                            
                                How to avoid OOM (Out of memory) error when retrieving all records from huge table?
                            
                                What are the pros and cons of Anchor Modeling? [closed]
                            
                                Is there a generic workaround to express a derived column list in Oracle (and MySQL)?
                            
                                Pivot table returns multiple rows with NULL, results should be grouped on one row
                            
                                Can a subquery be used in an Oracle ALTER statement?
                            
                                Improving the performance of non clustered Index Seek
                            
                                Fastest way to insert in parallel to a single table
                            
                                How to use MAX() on a subquery result?
                            
                                Why is MySQL slow when using LIMIT in my query?
                            
                                Does using fully qualified names affect performance?
                            
                                Self join many-to-many relationship
                            
                                Is there a way to TRIM all data in a SELECT * FROM statement?
                            
                                Use a LIKE clause in part of an INNER JOIN
                            
                                iSeries DB2 - Is there any way to select the identity value from an insert statement?
                            
                                Get column name of property mapped with Hibernate

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Trouble using ROW_NUMBER() OVER (PARTITION BY ...)

Tags:

sql

sql-server

sql-server-2008

row-number

gaps-and-islands

Thracian

People also ask

2 Answers

Trevor

Dominic P

Recent Activity

Donate For Us