I have some data that looks like this:
CustID  EventID     TimeStamp
1       17          1/1/15 13:23
1       17          1/1/15 14:32
1       13          1/1/25 14:54
1       13          1/3/15 1:34
1       17          1/5/15 2:54
1       1           1/5/15 3:00
2       17          2/5/15 9:12
2       17          2/5/15 9:18
2       1           2/5/15 10:02
2       13          2/8/15 7:43
2       13          2/8/15 7:50
2       1           2/8/15 8:00
I'm trying to use the row_number function to get it to look like this:
CustID  EventID     TimeStamp      SeqNum
1       17          1/1/15 13:23    1
1       17          1/1/15 14:32    1
1       13          1/1/25 14:54    2
1       13          1/3/15 1:34     2
1       17          1/5/15 2:54     3
1       1           1/5/15 3:00     4
2       17          2/5/15 9:12     1
2       17          2/5/15 9:18     1
2       1           2/5/15 10:02    2   
2       13          2/8/15 7:43     3
2       13          2/8/15 7:50     3
2       1           2/8/15 8:00     4
I tried this:
row_number () over 
          (partition by custID, EventID
           order by custID, TimeStamp asc) SeqNum]
but got this back:
CustID  EventID     TimeStamp      SeqNum
1       17          1/1/15 13:23    1
1       17          1/1/15 14:32    2
1       13          1/1/25 14:54    3
1       13          1/3/15 1:34     4
1       17          1/5/15 2:54     5
1       1           1/5/15 3:00     6
2       17          2/5/15 9:12     1
2       17          2/5/15 9:18     2
2       1           2/5/15 10:02    3   
2       13          2/8/15 7:43     4
2       13          2/8/15 7:50     5
2       1           2/8/15 8:00     6
how can I get it to sequence based on the change in the EventID?
This is tricky.  You need a multi-step process.  You need to identify the groups (a difference of row_number() works for this).  Then, assign an increasing constant to each group.  And then use dense_rank():
select sd.*, dense_rank() over (partition by custid order by mints) as seqnum
from (select sd.*,
             min(timestamp) over (partition by custid, eventid, grp) as mints
      from (select sd.*,
                   (row_number() over (partition by custid order by timestamp) -
                    row_number() over (partition by custid, eventid order by timestamp)
                   ) as grp
            from somedata sd
           ) sd
     ) sd;
Another method is to use lag() and a cumulative sum:
select sd.*,
       sum(case when prev_eventid is null or prev_eventid <> eventid
                then 1 else 0 end) over (partition by custid order by timestamp
                                        ) as seqnum
from (select sd.*,
             lag(eventid) over (partition by custid order by timestamp) as prev_eventid
      from somedata sd
     ) sd;
EDIT:
The last time I used Amazon Redshift it didn't have row_number().  You can do:
select sd.*, dense_rank() over (partition by custid order by mints) as seqnum
from (select sd.*,
             min(timestamp) over (partition by custid, eventid, grp) as mints
      from (select sd.*,
                   (row_number() over (partition by custid order by timestamp rows between unbounded preceding and current row) -
                    row_number() over (partition by custid, eventid order by timestamp rows between unbounded preceding and current row)
                   ) as grp
            from somedata sd
           ) sd
     ) sd;
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With