Find Intersection Between Date Ranges In PostgreSQL

Question

I have records with a two dates check_in and check_out, I want to know the ranges when more than one person was checked in at the same time.

So if I have the following checkin / checkouts:

Person A: 1PM - 6PM
Person B: 3PM - 10PM
Person C: 9PM - 11PM

I would want to get 3PM - 6PM (Overlap of person A and B) and 9PM - 10PM (overlap of person B and C).

I can write an algorithm to do this in linear time with code, is it possible to do this via a relational query in linear time with PostgreSQL as well?

It needs to have a minimal response, meaning no overlapping ranges. So if there were a result which gave the range 6PM - 9PM and 8PM - 10PM it would be incorrect. It should instead return 6PM - 10pm.

Erwin Brandstetter · Accepted Answer

Assumptions

The solution heavily depends on the exact table definition including all constraints. For lack of information in the question I'll assume this table:

CREATE TABLE booking (
  booking_id serial PRIMARY KEY
, check_in   timestamptz NOT NULL
, check_out  timestamptz NOT NULL
, CONSTRAINT valid_range CHECK (check_out > check_in)
);

So, no NULL values, only valid ranges with inclusive lower and exclusive upper bound, and we don't really care who checks in.

Also assuming a current version of Postgres, at least 9.2.

Query

One way to do it with only SQL using a UNION ALL and window functions:

SELECT ts AS check_id, next_ts As check_out
FROM  (
   SELECT *, lead(ts) OVER (ORDER BY ts) AS next_ts
   FROM  (
      SELECT *, lag(people_ct, 1 , 0) OVER (ORDER BY ts) AS prev_ct
      FROM  (
         SELECT ts, sum(sum(change)) OVER (ORDER BY ts)::int AS people_ct
         FROM  (
            SELECT check_in AS ts, 1 AS change FROM booking
            UNION ALL
            SELECT check_out, -1 FROM booking
            ) sub1
         GROUP  BY 1
         ) sub2
      ) sub3
   WHERE  people_ct > 1 AND prev_ct < 2 OR  -- start overlap
          people_ct < 2 AND prev_ct > 1     -- end overlap
   ) sub4
WHERE  people_ct > 1 AND prev_ct < 2;

SQL Fiddle.

Explanation

In subquery sub1 derive a table of check_in and check_out in one column. check_in adds one to the crowd, check_out subtracts one.
In sub2 sum all events for the same point in time and compute a running count with a window function: that's the window function sum() over an aggregate sum() - and cast to integer or we get numeric from this:
```
   sum(sum(change)) OVER (ORDER BY ts)::int
```
In sub3 look at the count of the previous row
In sub4 only keep rows where overlapping time ranges start and end, and pull the end of the time range into the same row with lead().
Finally, only keep rows, where time ranges start.

To optimize performance I would walk through the table once in a plpgsql function like demonstrated in this related answer on dba.SE:

Calculate Difference in Overlapping Time in PostgreSQL / SSRS

dcieslak · Answer

Idea is to divide time in periods and save them as bit values with specified granularity.

0 - nobody is checked in one grain
1 - somebody is checked in one grain

Let's assume that granularity is 1 hour and period is 1 day.

000000000000000000000000 means nobody is checked in that day
000000000000000000000110 means somebody is checked between 21 and 23
000000000000011111000000 means somebody is checked between 13 and 18
000000000000000111111100 means somebody is checked between 15 and 22

After that we do binary OR on the each value in the range and we have our answer.

000000000000011111111110

It can be done in linear time. Here is an example from Oracle but it can be transformed to PostgreSQL easily.

with rec (checkin, checkout)
as ( select 13, 18 from dual 
   union all 
    select 15, 22 from dual 
   union all 
    select 21, 23 from dual )
,spanempty ( empt)
 as ( select '000000000000000000000000' from dual) ,
 spanfull( full)
 as ( select '111111111111111111111111' from dual)
, bookingbin( binbook) as ( select  substr(empt, 1, checkin) || 
        substr(full, checkin, checkout-checkin) || 
        substr(empt, checkout, 24-checkout) 
 from rec 
 cross join spanempty
 cross join spanfull ),
 bookingInt (rn, intbook) as 
 ( select rownum, bin2dec(binbook) from bookingbin),
 bitAndSum (bitAndSumm) as (
 select sum(bitand(b1.intbook, b2.intbook)) from bookingInt b1 
 join bookingInt b2 
 on b1.rn = b2.rn -1 ) ,
 SumAll (sumall) as (
 select sum(bin2dec(binbook)) from bookingBin  )
select lpad(dec2bin(sumall - bitAndSumm), 24, '0')
from SumAll, bitAndSum

Result:

000000000000011111111110

Find Intersection Between Date Ranges In PostgreSQL

Tags:

sql

postgresql

relational-database

window-functions

gaps-and-islands

Josh Horowitz

2 Answers

Assumptions

Query

Explanation

Erwin Brandstetter

dcieslak

Recent Activity

Donate For Us

Find Intersection Between Date Ranges In PostgreSQL

Tags:

sql

postgresql

relational-database

window-functions

gaps-and-islands

Josh Horowitz

2 Answers

Assumptions

Query

Explanation

Erwin Brandstetter

dcieslak

Related questions

Recent Activity

Donate For Us