I have some data that looks like this:
CustID EventID TimeStamp
1 17 1/1/15 13:23
1 17 1/1/15 14:32
1 13 1/1/25 14:54
1 13 1/3/15 1:34
1 17 1/5/15 2:54
1 1 1/5/15 3:00
2 17 2/5/15 9:12
2 17 2/5/15 9:18
2 1 2/5/15 10:02
2 13 2/8/15 7:43
2 13 2/8/15 7:50
2 1 2/8/15 8:00
I'm trying to use the row_number function to get it to look like this:
CustID EventID TimeStamp SeqNum
1 17 1/1/15 13:23 1
1 17 1/1/15 14:32 1
1 13 1/1/25 14:54 2
1 13 1/3/15 1:34 2
1 17 1/5/15 2:54 3
1 1 1/5/15 3:00 4
2 17 2/5/15 9:12 1
2 17 2/5/15 9:18 1
2 1 2/5/15 10:02 2
2 13 2/8/15 7:43 3
2 13 2/8/15 7:50 3
2 1 2/8/15 8:00 4
I tried this:
row_number () over
(partition by custID, EventID
order by custID, TimeStamp asc) SeqNum]
but got this back:
CustID EventID TimeStamp SeqNum
1 17 1/1/15 13:23 1
1 17 1/1/15 14:32 2
1 13 1/1/25 14:54 3
1 13 1/3/15 1:34 4
1 17 1/5/15 2:54 5
1 1 1/5/15 3:00 6
2 17 2/5/15 9:12 1
2 17 2/5/15 9:18 2
2 1 2/5/15 10:02 3
2 13 2/8/15 7:43 4
2 13 2/8/15 7:50 5
2 1 2/8/15 8:00 6
how can I get it to sequence based on the change in the EventID?
This is tricky. You need a multi-step process. You need to identify the groups (a difference of row_number()
works for this). Then, assign an increasing constant to each group. And then use dense_rank()
:
select sd.*, dense_rank() over (partition by custid order by mints) as seqnum
from (select sd.*,
min(timestamp) over (partition by custid, eventid, grp) as mints
from (select sd.*,
(row_number() over (partition by custid order by timestamp) -
row_number() over (partition by custid, eventid order by timestamp)
) as grp
from somedata sd
) sd
) sd;
Another method is to use lag()
and a cumulative sum:
select sd.*,
sum(case when prev_eventid is null or prev_eventid <> eventid
then 1 else 0 end) over (partition by custid order by timestamp
) as seqnum
from (select sd.*,
lag(eventid) over (partition by custid order by timestamp) as prev_eventid
from somedata sd
) sd;
EDIT:
The last time I used Amazon Redshift it didn't have row_number()
. You can do:
select sd.*, dense_rank() over (partition by custid order by mints) as seqnum
from (select sd.*,
min(timestamp) over (partition by custid, eventid, grp) as mints
from (select sd.*,
(row_number() over (partition by custid order by timestamp rows between unbounded preceding and current row) -
row_number() over (partition by custid, eventid order by timestamp rows between unbounded preceding and current row)
) as grp
from somedata sd
) sd
) sd;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With