Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge continuous rows with Postgresql

Tags:

sql

postgresql

I have a slots table like this :

   Column   |            Type             |
------------+-----------------------------+
 id         | integer                     |
 begin_at   | timestamp without time zone |
 end_at     | timestamp without time zone |
 user_id    | integer                     |

and I like to select merged rows for continuous time. Let's say I have (simplified) data like :

(1, 5:15, 5:30, 1)
(2, 5:15, 5:30, 2)
(3, 5:30, 5:45, 2)
(4, 5:45, 6:00, 2)
(5, 8:15, 8:30, 2)
(6, 8:30, 8:45, 2)

I would like to know if it's possible to select rows formatted like :

(5:15, 5:30, 1)
(5:15, 6:00, 2) // <======= rows id 2,3 and 4 merged
(8:15, 8:45, 2) // <======= rows id 5 and 6 merged

EDIT: Here's the SQLfiddle

I'm using Postgresql, version 9.3!

Thank you!

like image 464
Mathieu Mahé Avatar asked Oct 12 '14 20:10

Mathieu Mahé


2 Answers

Here is one method for solving this problem. Create a flag that determines if a one record does not overlap with the previous one. This is the start of a group. Then take the cumulative sum of this flag and use that for grouping:

select user_id, min(begin_at) as begin_at, max(end_at) as end_at
from (select s.*, sum(startflag) over (partition by user_id order by begin_at) as grp
      from (select s.*,
                   (case when lag(end_at) over (partition by user_id order by begin_at) >= begin_at
                         then 0 else 1
                    end) as startflag
            from slots s
           ) s
     ) s
group by user_id, grp;

Here is a SQL Fiddle.

like image 95
Gordon Linoff Avatar answered Nov 14 '22 13:11

Gordon Linoff


Gordon Linoff already provided the answer (I upvoted).

I've used the same approach, but wanted to deal with tsrange type. So I came up with this construct:

SELECT min(id) b_id, min(begin_at) b_at, max(end_at) e_at, grp, user_id
  FROM (
    SELECT t.*, sum(g) OVER (ORDER BY id) grp
      FROM (
        SELECT s.*, (NOT r -|- lag(r,1,r)
                     OVER (PARTITION BY user_id ORDER BY id))::int g
          FROM (SELECT id,begin_at,end_at,user_id,
                       tsrange(begin_at,end_at,'[)') r FROM slots) s
      ) t
  ) u
 GROUP BY grp, user_id
 ORDER BY grp;

Unfortunately, on the top level one has to use min(begin_at) and max(end_at), as there're no aggregate functions for the range-based union operator +.

I create ranges with exclusive upper bounds, this allows me to use “is adjacent to” (-|-) operator. I compare current tsrange with the one on the previous row, defaulting to the current one in case there's no previous. Then I negate the comparison and cast to integer, which gives me 1 in cases when new group starts.

like image 32
vyegorov Avatar answered Nov 14 '22 15:11

vyegorov