Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find Intersection Between Date Ranges In PostgreSQL

I have records with a two dates check_in and check_out, I want to know the ranges when more than one person was checked in at the same time.

So if I have the following checkin / checkouts:

  • Person A: 1PM - 6PM
  • Person B: 3PM - 10PM
  • Person C: 9PM - 11PM

I would want to get 3PM - 6PM (Overlap of person A and B) and 9PM - 10PM (overlap of person B and C).

I can write an algorithm to do this in linear time with code, is it possible to do this via a relational query in linear time with PostgreSQL as well?

It needs to have a minimal response, meaning no overlapping ranges. So if there were a result which gave the range 6PM - 9PM and 8PM - 10PM it would be incorrect. It should instead return 6PM - 10pm.

like image 938
Josh Horowitz Avatar asked Sep 25 '22 17:09

Josh Horowitz


2 Answers

Assumptions

The solution heavily depends on the exact table definition including all constraints. For lack of information in the question I'll assume this table:

CREATE TABLE booking (
  booking_id serial PRIMARY KEY
, check_in   timestamptz NOT NULL
, check_out  timestamptz NOT NULL
, CONSTRAINT valid_range CHECK (check_out > check_in)
);

So, no NULL values, only valid ranges with inclusive lower and exclusive upper bound, and we don't really care who checks in.

Also assuming a current version of Postgres, at least 9.2.

Query

One way to do it with only SQL using a UNION ALL and window functions:

SELECT ts AS check_id, next_ts As check_out
FROM  (
   SELECT *, lead(ts) OVER (ORDER BY ts) AS next_ts
   FROM  (
      SELECT *, lag(people_ct, 1 , 0) OVER (ORDER BY ts) AS prev_ct
      FROM  (
         SELECT ts, sum(sum(change)) OVER (ORDER BY ts)::int AS people_ct
         FROM  (
            SELECT check_in AS ts, 1 AS change FROM booking
            UNION ALL
            SELECT check_out, -1 FROM booking
            ) sub1
         GROUP  BY 1
         ) sub2
      ) sub3
   WHERE  people_ct > 1 AND prev_ct < 2 OR  -- start overlap
          people_ct < 2 AND prev_ct > 1     -- end overlap
   ) sub4
WHERE  people_ct > 1 AND prev_ct < 2;

SQL Fiddle.

Explanation

  • In subquery sub1 derive a table of check_in and check_out in one column. check_in adds one to the crowd, check_out subtracts one.

  • In sub2 sum all events for the same point in time and compute a running count with a window function: that's the window function sum() over an aggregate sum() - and cast to integer or we get numeric from this:

       sum(sum(change)) OVER (ORDER BY ts)::int
    
  • In sub3 look at the count of the previous row

  • In sub4 only keep rows where overlapping time ranges start and end, and pull the end of the time range into the same row with lead().

  • Finally, only keep rows, where time ranges start.


To optimize performance I would walk through the table once in a plpgsql function like demonstrated in this related answer on dba.SE:

  • Calculate Difference in Overlapping Time in PostgreSQL / SSRS
like image 163
Erwin Brandstetter Avatar answered Oct 18 '22 19:10

Erwin Brandstetter


Idea is to divide time in periods and save them as bit values with specified granularity.

  • 0 - nobody is checked in one grain
  • 1 - somebody is checked in one grain

Let's assume that granularity is 1 hour and period is 1 day.

  • 000000000000000000000000 means nobody is checked in that day
  • 000000000000000000000110 means somebody is checked between 21 and 23
  • 000000000000011111000000 means somebody is checked between 13 and 18
  • 000000000000000111111100 means somebody is checked between 15 and 22

After that we do binary OR on the each value in the range and we have our answer.

  • 000000000000011111111110

It can be done in linear time. Here is an example from Oracle but it can be transformed to PostgreSQL easily.

with rec (checkin, checkout)
as ( select 13, 18 from dual 
   union all 
    select 15, 22 from dual 
   union all 
    select 21, 23 from dual )
,spanempty ( empt)
 as ( select '000000000000000000000000' from dual) ,
 spanfull( full)
 as ( select '111111111111111111111111' from dual)
, bookingbin( binbook) as ( select  substr(empt, 1, checkin) || 
        substr(full, checkin, checkout-checkin) || 
        substr(empt, checkout, 24-checkout) 
 from rec 
 cross join spanempty
 cross join spanfull ),
 bookingInt (rn, intbook) as 
 ( select rownum, bin2dec(binbook) from bookingbin),
 bitAndSum (bitAndSumm) as (
 select sum(bitand(b1.intbook, b2.intbook)) from bookingInt b1 
 join bookingInt b2 
 on b1.rn = b2.rn -1 ) ,
 SumAll (sumall) as (
 select sum(bin2dec(binbook)) from bookingBin  )
select lpad(dec2bin(sumall - bitAndSumm), 24, '0')
from SumAll, bitAndSum

Result:

000000000000011111111110
like image 44
dcieslak Avatar answered Oct 18 '22 18:10

dcieslak