I have records with a two dates check_in and check_out, I want to know the ranges when more than one person was checked in at the same time.
So if I have the following checkin / checkouts:
1PM - 6PM
3PM - 10PM
9PM - 11PM
I would want to get 3PM - 6PM (Overlap of person A and B) and 9PM - 10PM (overlap of person B and C).
I can write an algorithm to do this in linear time with code, is it possible to do this via a relational query in linear time with PostgreSQL as well?
It needs to have a minimal response, meaning no overlapping ranges. So if there were a result which gave the range 6PM - 9PM and 8PM - 10PM it would be incorrect. It should instead return 6PM - 10pm.
The solution heavily depends on the exact table definition including all constraints. For lack of information in the question I'll assume this table:
CREATE TABLE booking (
booking_id serial PRIMARY KEY
, check_in timestamptz NOT NULL
, check_out timestamptz NOT NULL
, CONSTRAINT valid_range CHECK (check_out > check_in)
);
So, no NULL values, only valid ranges with inclusive lower and exclusive upper bound, and we don't really care who checks in.
Also assuming a current version of Postgres, at least 9.2.
One way to do it with only SQL using a UNION ALL and window functions:
SELECT ts AS check_id, next_ts As check_out
FROM (
SELECT *, lead(ts) OVER (ORDER BY ts) AS next_ts
FROM (
SELECT *, lag(people_ct, 1 , 0) OVER (ORDER BY ts) AS prev_ct
FROM (
SELECT ts, sum(sum(change)) OVER (ORDER BY ts)::int AS people_ct
FROM (
SELECT check_in AS ts, 1 AS change FROM booking
UNION ALL
SELECT check_out, -1 FROM booking
) sub1
GROUP BY 1
) sub2
) sub3
WHERE people_ct > 1 AND prev_ct < 2 OR -- start overlap
people_ct < 2 AND prev_ct > 1 -- end overlap
) sub4
WHERE people_ct > 1 AND prev_ct < 2;
SQL Fiddle.
In subquery sub1 derive a table of check_in and check_out in one column. check_in adds one to the crowd, check_out subtracts one.
In sub2 sum all events for the same point in time and compute a running count with a window function: that's the window function sum() over an aggregate sum() - and cast to integer or we get numeric from this:
sum(sum(change)) OVER (ORDER BY ts)::int
In sub3 look at the count of the previous row
In sub4 only keep rows where overlapping time ranges start and end, and pull the end of the time range into the same row with lead().
Finally, only keep rows, where time ranges start.
To optimize performance I would walk through the table once in a plpgsql function like demonstrated in this related answer on dba.SE:
Idea is to divide time in periods and save them as bit values with specified granularity.
0 - nobody is checked in one grain1 - somebody is checked in one grainLet's assume that granularity is 1 hour and period is 1 day.
After that we do binary OR on the each value in the range and we have our answer.
It can be done in linear time. Here is an example from Oracle but it can be transformed to PostgreSQL easily.
with rec (checkin, checkout)
as ( select 13, 18 from dual
union all
select 15, 22 from dual
union all
select 21, 23 from dual )
,spanempty ( empt)
as ( select '000000000000000000000000' from dual) ,
spanfull( full)
as ( select '111111111111111111111111' from dual)
, bookingbin( binbook) as ( select substr(empt, 1, checkin) ||
substr(full, checkin, checkout-checkin) ||
substr(empt, checkout, 24-checkout)
from rec
cross join spanempty
cross join spanfull ),
bookingInt (rn, intbook) as
( select rownum, bin2dec(binbook) from bookingbin),
bitAndSum (bitAndSumm) as (
select sum(bitand(b1.intbook, b2.intbook)) from bookingInt b1
join bookingInt b2
on b1.rn = b2.rn -1 ) ,
SumAll (sumall) as (
select sum(bin2dec(binbook)) from bookingBin )
select lpad(dec2bin(sumall - bitAndSumm), 24, '0')
from SumAll, bitAndSum
Result:
000000000000011111111110
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With