Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all intersections of all sets of ranges in PostgreSQL

I'm looking for an efficient way to find all the intersections between sets of timestamp ranges. It needs to work with PostgreSQL 9.2.

Let's say the ranges represent the times when a person is available to meet. Each person may have one or more ranges of times when they are available. I want to find all the time periods when a meeting can take place (ie. during which all people are available).

This is what I've got so far. It seems to work, but I don't think it's very efficient, since it considers one person's availability at a time.

WITH RECURSIVE td AS
(
    -- Test data. Returns:
    -- ["2014-01-20 00:00:00","2014-01-31 00:00:00")
    -- ["2014-02-01 00:00:00","2014-02-20 00:00:00")
    -- ["2014-04-15 00:00:00","2014-04-20 00:00:00")
    SELECT 1 AS entity_id, '2014-01-01'::timestamp AS begin_time, '2014-01-31'::timestamp AS end_time
    UNION SELECT 1, '2014-02-01', '2014-02-28'
    UNION SELECT 1, '2014-04-01', '2014-04-30'
    UNION SELECT 2, '2014-01-15', '2014-02-20'
    UNION SELECT 2, '2014-04-15', '2014-05-05'
    UNION SELECT 3, '2014-01-20', '2014-04-20'
)
, ranges AS
(
    -- Convert to tsrange type
    SELECT entity_id, tsrange(begin_time, end_time) AS the_range
    FROM td
)
, min_max AS
(
    SELECT MIN(entity_id), MAX(entity_id)
    FROM td
)
, inter AS
(
    -- Ranges for the lowest ID
    SELECT entity_id AS last_id, the_range
    FROM ranges r
    WHERE r.entity_id = (SELECT min FROM min_max)

    UNION ALL

    -- Iteratively intersect with ranges for the next higher ID
    SELECT entity_id, r.the_range * i.the_range
    FROM ranges r
    JOIN inter i ON r.the_range && i.the_range
    WHERE r.entity_id > i.last_id
        AND NOT EXISTS
        (
            SELECT *
            FROM ranges r2
            WHERE r2.entity_id < r.entity_id AND r2.entity_id > i.last_id
        )
)
-- Take the final set of intersections
SELECT *
FROM inter
WHERE last_id = (SELECT max FROM min_max)
ORDER BY the_range;
like image 511
EM0 Avatar asked Jul 25 '14 16:07

EM0


2 Answers

I created the tsrange_interception_agg aggregate

create function tsrange_interception (
    internal_state tsrange, next_data_values tsrange
) returns tsrange as $$
    select internal_state * next_data_values;
$$ language sql;

create aggregate tsrange_interception_agg (tsrange) (
    sfunc = tsrange_interception,
    stype = tsrange,
    initcond = $$[-infinity, infinity]$$
);

Then this query

with td (id, begin_time, end_time) as
(
    values
    (1, '2014-01-01'::timestamp, '2014-01-31'::timestamp),
    (1, '2014-02-01', '2014-02-28'),
    (1, '2014-04-01', '2014-04-30'),
    (2, '2014-01-15', '2014-02-20'),
    (2, '2014-04-15', '2014-05-05'),
    (3, '2014-01-20', '2014-04-20')
), ranges as (
    select
        id,
        row_number() over(partition by id) as rn,
        tsrange(begin_time, end_time) as tr
    from td
), cr as (
    select r0.tr tr0, r1.tr as tr1
    from ranges r0 cross join ranges r1
    where
        r0.id < r1.id and
        r0.tr && r1.tr and
        r0.id = (select min(id) from td)
)
select tr0 * tsrange_interception_agg(tr1) as interseptions
from cr
group by tr0
having count(*) = (select count(distinct id) from td) - 1
;
                 interseptions                 
-----------------------------------------------
 ["2014-02-01 00:00:00","2014-02-20 00:00:00")
 ["2014-01-20 00:00:00","2014-01-31 00:00:00")
 ["2014-04-15 00:00:00","2014-04-20 00:00:00")
like image 128
Clodoaldo Neto Avatar answered Oct 17 '22 23:10

Clodoaldo Neto


If you have a fixed number of entities you want to cross reference, you can use a cross join for each of them, and build the intersection (using the * operator on ranges).

Using a cross join like this is probably less efficient, though. The following example has more to do with explaining the more complex example below.

WITH td AS
(
    SELECT 1 AS entity_id, '2014-01-01'::timestamp AS begin_time, '2014-01-31'::timestamp AS end_time
    UNION SELECT 1, '2014-02-01', '2014-02-28'
    UNION SELECT 1, '2014-04-01', '2014-04-30'
    UNION SELECT 2, '2014-01-15', '2014-02-20'
    UNION SELECT 2, '2014-04-15', '2014-05-05'
    UNION SELECT 4, '2014-01-20', '2014-04-20'
)
,ranges AS
(
    -- Convert to tsrange type
    SELECT entity_id, tsrange(begin_time, end_time) AS the_range
    FROM td
)
SELECT r1.the_range * r2.the_range * r3.the_range AS r
FROM ranges r1
CROSS JOIN ranges r2
CROSS JOIN ranges r3
WHERE r1.entity_id=1 AND r2.entity_id=2 AND r3.entity_id=4
  AND NOT isempty(r1.the_range * r2.the_range * r3.the_range)
ORDER BY r

In this case a multiple cross join is probably less efficient because you don't actually need to have all the possible combinations of every range in reality, since isempty(r1.the_range * r2.the_range) is enough to make isempty(r1.the_range * r2.the_range * r3.the_range) true.

I don't think you can avoid going through each person's availability at time, since you want them all to be meet anyway.

What may help is to build the set of intersections incrementally, by cross joining each person's availability to the previous subset you've calculated using another recursive CTE (intersections in the example below). You then build the intersections incrementally and get rid of the empty ranges, both stored arrays:

WITH RECURSIVE td AS
(
    SELECT 1 AS entity_id, '2014-01-01'::timestamp AS begin_time, '2014-01-31'::timestamp AS end_time
    UNION SELECT 1, '2014-02-01', '2014-02-28'
    UNION SELECT 1, '2014-04-01', '2014-04-30'
    UNION SELECT 2, '2014-01-15', '2014-02-20'
    UNION SELECT 2, '2014-04-15', '2014-05-05'
    UNION SELECT 4, '2014-01-20', '2014-04-20'
)
,ranges AS
(
    -- Convert to tsrange type
    SELECT entity_id, tsrange(begin_time, end_time) AS the_range
    FROM td
)
,ranges_arrays AS (
    -- Prepare an array of all possible intervals per entity
    SELECT entity_id, array_agg(the_range) AS ranges_arr
    FROM ranges
       GROUP BY entity_id
)
,numbered_ranges_arrays AS (
    -- We'll join using pos+1 next, so we want continuous integers
    -- I've changed the example entity_id from 3 to 4 to demonstrate this.
    SELECT ROW_NUMBER() OVER () AS pos, entity_id, ranges_arr
    FROM ranges_arrays
)
,intersections (pos, subranges) AS (
    -- We start off with the infinite range.
    SELECT 0::bigint, ARRAY['[,)'::tsrange]
    UNION ALL
    -- Then, we unnest the previous intermediate result,
    -- cross join it against the array of ranges from the
    -- next row in numbered_ranges_arrays (joined via pos+1).
    -- We take the intersection and remove the empty array.
    SELECT r.pos,
           ARRAY(SELECT x * y FROM unnest(r.ranges_arr) x CROSS JOIN unnest(i.subranges) y WHERE NOT isempty(x * y))
    FROM numbered_ranges_arrays r
        INNER JOIN intersections i ON r.pos=i.pos+1
)
,last_intersections AS (
    -- We just really want the result from the last operation (with the max pos).
    SELECT subranges FROM intersections ORDER BY pos DESC LIMIT 1
)
SELECT unnest(subranges) r FROM last_intersections ORDER BY r

I'm not sure whether this is likely to perform better, unfortunately. You'd probably need a larger dataset to have meaningful benchmarks.

like image 37
Bruno Avatar answered Oct 18 '22 01:10

Bruno