I have records with a two dates check_in
and check_out
, I want to know the ranges when more than one person was checked in at the same time.
So if I have the following checkin / checkouts:
1PM - 6PM
3PM - 10PM
9PM - 11PM
I would want to get 3PM - 6PM
(Overlap of person A and B) and 9PM - 10PM
(overlap of person B and C).
I can write an algorithm to do this in linear time with code, is it possible to do this via a relational query in linear time with PostgreSQL
as well?
It needs to have a minimal response, meaning no overlapping ranges. So if there were a result which gave the range 6PM - 9PM
and 8PM - 10PM
it would be incorrect. It should instead return 6PM - 10pm
.
The solution heavily depends on the exact table definition including all constraints. For lack of information in the question I'll assume this table:
CREATE TABLE booking (
booking_id serial PRIMARY KEY
, check_in timestamptz NOT NULL
, check_out timestamptz NOT NULL
, CONSTRAINT valid_range CHECK (check_out > check_in)
);
So, no NULL values, only valid ranges with inclusive lower and exclusive upper bound, and we don't really care who checks in.
Also assuming a current version of Postgres, at least 9.2.
One way to do it with only SQL using a UNION ALL
and window functions:
SELECT ts AS check_id, next_ts As check_out
FROM (
SELECT *, lead(ts) OVER (ORDER BY ts) AS next_ts
FROM (
SELECT *, lag(people_ct, 1 , 0) OVER (ORDER BY ts) AS prev_ct
FROM (
SELECT ts, sum(sum(change)) OVER (ORDER BY ts)::int AS people_ct
FROM (
SELECT check_in AS ts, 1 AS change FROM booking
UNION ALL
SELECT check_out, -1 FROM booking
) sub1
GROUP BY 1
) sub2
) sub3
WHERE people_ct > 1 AND prev_ct < 2 OR -- start overlap
people_ct < 2 AND prev_ct > 1 -- end overlap
) sub4
WHERE people_ct > 1 AND prev_ct < 2;
SQL Fiddle.
In subquery sub1
derive a table of check_in
and check_out
in one column. check_in
adds one to the crowd, check_out
subtracts one.
In sub2
sum all events for the same point in time and compute a running count with a window function: that's the window function sum()
over an aggregate sum()
- and cast to integer
or we get numeric
from this:
sum(sum(change)) OVER (ORDER BY ts)::int
In sub3
look at the count of the previous row
In sub4
only keep rows where overlapping time ranges start and end, and pull the end of the time range into the same row with lead()
.
Finally, only keep rows, where time ranges start.
To optimize performance I would walk through the table once in a plpgsql function like demonstrated in this related answer on dba.SE:
Idea is to divide time in periods and save them as bit values with specified granularity.
0
- nobody is checked in one grain1
- somebody is checked in one grainLet's assume that granularity is 1 hour and period is 1 day.
After that we do binary OR on the each value in the range and we have our answer.
It can be done in linear time. Here is an example from Oracle but it can be transformed to PostgreSQL easily.
with rec (checkin, checkout)
as ( select 13, 18 from dual
union all
select 15, 22 from dual
union all
select 21, 23 from dual )
,spanempty ( empt)
as ( select '000000000000000000000000' from dual) ,
spanfull( full)
as ( select '111111111111111111111111' from dual)
, bookingbin( binbook) as ( select substr(empt, 1, checkin) ||
substr(full, checkin, checkout-checkin) ||
substr(empt, checkout, 24-checkout)
from rec
cross join spanempty
cross join spanfull ),
bookingInt (rn, intbook) as
( select rownum, bin2dec(binbook) from bookingbin),
bitAndSum (bitAndSumm) as (
select sum(bitand(b1.intbook, b2.intbook)) from bookingInt b1
join bookingInt b2
on b1.rn = b2.rn -1 ) ,
SumAll (sumall) as (
select sum(bin2dec(binbook)) from bookingBin )
select lpad(dec2bin(sumall - bitAndSumm), 24, '0')
from SumAll, bitAndSum
Result:
000000000000011111111110
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With