Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server: row_number partitioned by timeout

I have a table with a series of (IP varchar(15), DateTime datetime2) values. Each row corresponds to an HTTP request made by a user. I want to assign session numbers to these rows. Different IP-addresses have different session numbers. The same IP should be assigned a new session number if the last request is older than 30min. Here is a sample output:

IP,      DateTime,         SessionNumber, RequestNumber
1.1.1.1, 2012-01-01 00:01, 1,             1
1.1.1.1, 2012-01-01 00:02, 1,             2
1.1.1.1, 2012-01-01 00:03, 1,             3
1.1.1.2, 2012-01-01 00:04, 2,             1 --different IP => new session number
1.1.1.2, 2012-01-01 00:05, 2,             2
1.1.1.2, 2012-01-01 00:40, 3,             1 --same IP, but last request 35min ago (> 30min)

Columns 1 and 2 are inputs, 3 and 4 are the desired outputs. The table shows two users.

As the underlying is table is truely large, how can this be solved efficiently? I'd prefer a small constant amount of passes over the data (one or two).

like image 723
usr Avatar asked Apr 08 '12 18:04

usr


1 Answers

Here's a couple of tries at it.

;WITH CTE1 AS
(
SELECT *,
IIF(DATEDIFF(MINUTE,
       LAG(DateTime) OVER (PARTITION BY IP ORDER BY DateTime),
       DateTime) < 30,0,1) AS SessionFlag
FROM Sessions
), CTE2 AS
(
SELECT *,
       SUM(SessionFlag) OVER (PARTITION BY IP 
                                  ORDER BY DateTime) AS IPSessionNumber
FROM CTE1
)
SELECT IP,
       DateTime,
       DENSE_RANK() OVER (ORDER BY IP, IPSessionNumber) AS SessionNumber,
       ROW_NUMBER() OVER (PARTITION BY IP, IPSessionNumber 
                              ORDER BY DateTime) AS RequestNumber
FROM CTE2

This has two sort operations (by IP, DateTime then by IP, IPSessionNumber) but does assume that the SessionNumber can be assigned arbitrarily as long as a different unique session number is assigned to each new session per the ip address/ 30 minute rule.

To assign the SessionNumbers sequentially in chronological order. I used the following.

;WITH CTE1 AS
(
SELECT *,
IIF(DATEDIFF(MINUTE,
       LAG(DateTime) OVER (PARTITION BY IP ORDER BY DateTime),
       DateTime) < 30,0,1) AS SessionFlag
FROM Sessions
), CTE2 AS(
SELECT *,
       SUM(SessionFlag) OVER (ORDER BY DateTime) AS GlobalSessionNo
FROM CTE1
), CTE3 AS(
SELECT *,
       MAX(CASE WHEN SessionFlag = 1 THEN GlobalSessionNo END) 
               OVER (PARTITION BY IP ORDER BY DateTime) AS SessionNumber
FROM CTE2)
SELECT IP,
       DateTime,
       SessionNumber,
       ROW_NUMBER() OVER (PARTITION BY SessionNumber 
                              ORDER BY DateTime) AS RequestNumber
FROM CTE3

This increases the number of sort operations to 4 however.

like image 52
Martin Smith Avatar answered Oct 13 '22 21:10

Martin Smith