Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres 9.1 - Numbering groups of rows

Tags:

sql

postgresql

I have some data that represents different 'actions'. These 'actions' collectively comprise an 'event'.

The data looks like this:

    EventID   |   UserID   |   Action   |        TimeStamp
--------------+------------+------------+-------------------------
       1      |    111     |   Start    |   2012-01-01 08:00:00
       1      |    111     |   Stop     |   2012-01-01 08:59:59
       1      |    999     |   Start    |   2012-01-01 09:00:00
       1      |    999     |   Stop     |   2012-01-01 09:59:59
       1      |    111     |   Start    |   2012-01-01 10:00:00
       1      |    111     |   Stop     |   2012-01-01 10:30:00

As you can see, each single 'event' is made of one or more 'Actions' (or as I think of them, 'sub events').

I need to identify each 'sub event' and give it an identifier. This is what I am looking for:

    EventID   |   SubeventID   |   UserID   |   Action   |        TimeStamp
--------------+----------------+------------+------------+-------------------------
       1      |       1        |    111     |   Start    |   2012-01-01 08:00:00
       1      |       1        |    111     |   Stop     |   2012-01-01 08:59:59
       1      |       2        |    999     |   Start    |   2012-01-01 09:00:00
       1      |       2        |    999     |   Stop     |   2012-01-01 09:59:59
       1      |       3        |    111     |   Start    |   2012-01-01 10:00:00
       1      |       3        |    111     |   Stop     |   2012-01-01 10:30:00

I need something that can start counting, but only increment when some column has a specific value (like "Action" = 'Start').

I have been trying to use Window Functions for this, but with limited success. I just can't seem to find a solution that I feel will work... any thoughts?

like image 290
losthorse Avatar asked Dec 29 '12 13:12

losthorse


1 Answers

If you have some field you can sort by, you could use the following query (untested):

SELECT 
    sum(("Action" = 'Start')::int) OVER (PARTITION BY "EventID" ORDER BY "Timestamp" ROWS UNBOUNDED PRECEDING)
FROM 
    events

Note that if the first SubEvent does not start with Start, it will have an event id of 0, which might not be what you want.


You could also use COUNT() in place of SUM():

SELECT 
    EventID 
  , COUNT(CASE WHEN Action = 'Start' THEN 1 END) 
        OVER ( PARTITION BY EventID 
               ORDER BY TimeStamp 
               ROWS UNBOUNDED PRECEDING ) 
      AS SubeventID 
  , UserID 
  , Action 
FROM 
    tableX AS t ; 

Tests at SQL-Fiddle: test

like image 200
Thilo Avatar answered Sep 29 '22 07:09

Thilo