I have a table that represents usage of a product, kind of like a log. Product usage is logged as multiple timestamps, I want to represent the same data using time ranges.
It looks like this (PostgreSQL 9.1):
userid | timestamp | product
-------------------------------------
001 | 2012-04-23 9:12:05 | foo
001 | 2012-04-23 9:12:07 | foo
001 | 2012-04-23 9:12:09 | foo
001 | 2012-04-23 9:12:11 | barbaz
001 | 2012-04-23 9:12:13 | barbaz
001 | 2012-04-23 9:15:00 | barbaz
001 | 2012-04-23 9:15:01 | barbaz
002 | 2012-04-24 3:41:01 | foo
002 | 2012-04-24 3:41:03 | foo
I want to collapse rows whose time difference with the previous run is less than a delta (say: 2 seconds), and get the begin time and the end time, like this:
userid | begin | end | product
----------------------------------------------------------
001 | 2012-04-23 9:12:05 | 2012-04-23 9:12:09 | foo
001 | 2012-04-23 9:12:11 | 2012-04-23 9:12:13 | barbaz
001 | 2012-04-23 9:15:00 | 2012-04-23 9:15:01 | barbaz
002 | 2012-04-24 3:41:01 | 2012-04-24 3:41:03 | foo
Please note that consecutive usage of the same product is split into two rows if their usage is more than delta (2 seconds, in this example) apart.
create table t (userid int, timestamp timestamp, product text);
insert into t (userid, timestamp, product) values
(001, '2012-04-23 9:12:05', 'foo'),
(001, '2012-04-23 9:12:07', 'foo'),
(001, '2012-04-23 9:12:09', 'foo'),
(001, '2012-04-23 9:12:11', 'barbaz'),
(001, '2012-04-23 9:12:13', 'barbaz'),
(001, '2012-04-23 9:15:00', 'barbaz'),
(001, '2012-04-23 9:15:01', 'barbaz'),
(002, '2012-04-24 3:41:01', 'foo'),
(002, '2012-04-24 3:41:03', 'foo')
;
Inspired by this answer, given a while back by @a_horse_with_no_name.
WITH groupped_t AS (
SELECT *, sum(grp_id) OVER (ORDER BY userid,product,"timestamp") AS grp_nr
FROM (SELECT t.*,
lag("timestamp") OVER
(PARTITION BY userid,product ORDER BY "timestamp") AS prev_ts,
CASE WHEN ("timestamp" - lag("timestamp") OVER
(PARTITION BY userid,product ORDER BY "timestamp")) <= '2s'::interval
THEN NULL ELSE 1 END AS grp_id
FROM t) AS g
), periods AS (
SELECT min(gt."timestamp") AS grp_min, max(gt."timestamp") AS grp_max, grp_nr
FROM groupped_t AS gt
GROUP BY gt.grp_nr
)
SELECT gt.userid, p.grp_min AS "begin", p.grp_max AS "end", gt.product
FROM periods p
JOIN groupped_t gt ON gt.grp_nr = p.grp_nr AND gt."timestamp" = p.grp_min
ORDER BY gt.userid, p.grp_min;
userid
, product
and time difference. I assumed it should be safe to PARTITION BY
first two fields in fact.groupped_t
gives me all the source columns + an extra running group number. I used only ORDER BY
here for the sum()
window function, as I need group IDs to be unique.periods
is just a helper query for the first and last timestamp in each group.groupped_t
with periods
on the grp_nr
(that's why I needed it to be unique) and a timestamp of the first entry in each group.You can also check this query on SQL Fiddle.
Note, that timestamp
, begin
and end
are reserved words in the SQL (end
also for PostgreSQL), so you should either avoid or double-quote them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With