I have a table with a series of (IP varchar(15), DateTime datetime2) values. Each row corresponds to an HTTP request made by a user. I want to assign session numbers to these rows. Different IP-addresses have different session numbers. The same IP should be assigned a new session number if the last request is older than 30min. Here is a sample output:
IP, DateTime, SessionNumber, RequestNumber
1.1.1.1, 2012-01-01 00:01, 1, 1
1.1.1.1, 2012-01-01 00:02, 1, 2
1.1.1.1, 2012-01-01 00:03, 1, 3
1.1.1.2, 2012-01-01 00:04, 2, 1 --different IP => new session number
1.1.1.2, 2012-01-01 00:05, 2, 2
1.1.1.2, 2012-01-01 00:40, 3, 1 --same IP, but last request 35min ago (> 30min)
Columns 1 and 2 are inputs, 3 and 4 are the desired outputs. The table shows two users.
As the underlying is table is truely large, how can this be solved efficiently? I'd prefer a small constant amount of passes over the data (one or two).
Here's a couple of tries at it.
;WITH CTE1 AS
(
SELECT *,
IIF(DATEDIFF(MINUTE,
LAG(DateTime) OVER (PARTITION BY IP ORDER BY DateTime),
DateTime) < 30,0,1) AS SessionFlag
FROM Sessions
), CTE2 AS
(
SELECT *,
SUM(SessionFlag) OVER (PARTITION BY IP
ORDER BY DateTime) AS IPSessionNumber
FROM CTE1
)
SELECT IP,
DateTime,
DENSE_RANK() OVER (ORDER BY IP, IPSessionNumber) AS SessionNumber,
ROW_NUMBER() OVER (PARTITION BY IP, IPSessionNumber
ORDER BY DateTime) AS RequestNumber
FROM CTE2
This has two sort operations (by IP, DateTime
then by IP, IPSessionNumber
) but does assume that the SessionNumber
can be assigned arbitrarily as long as a different unique session number is assigned to each new session per the ip address/ 30 minute rule.
To assign the SessionNumber
s sequentially in chronological order. I used the following.
;WITH CTE1 AS
(
SELECT *,
IIF(DATEDIFF(MINUTE,
LAG(DateTime) OVER (PARTITION BY IP ORDER BY DateTime),
DateTime) < 30,0,1) AS SessionFlag
FROM Sessions
), CTE2 AS(
SELECT *,
SUM(SessionFlag) OVER (ORDER BY DateTime) AS GlobalSessionNo
FROM CTE1
), CTE3 AS(
SELECT *,
MAX(CASE WHEN SessionFlag = 1 THEN GlobalSessionNo END)
OVER (PARTITION BY IP ORDER BY DateTime) AS SessionNumber
FROM CTE2)
SELECT IP,
DateTime,
SessionNumber,
ROW_NUMBER() OVER (PARTITION BY SessionNumber
ORDER BY DateTime) AS RequestNumber
FROM CTE3
This increases the number of sort operations to 4 however.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With