Select groups of values that cover a date interval together

Question

I have a database table with the following relevant columns:

group_id, value, valid_from, valid_to

None of those columns are unique (this is not the whole table).

Data could look like this:

group_id	valid_from	valid_to
1	20250201	20250230
2	20250201	20250230
2	20250201	20250230
3	20250201	20250215
3	20250213	20250220
3	20250221	20250230
4	20250101	20250220
4	20250210	20250215
4	20250219	20250230
5	20250101	20250115
5	20250513	20250619

So each group has one or more entries. Every entry has a date interval. The intervals can overlap and are not unique.

Now I have the impossible task to determine all groups which cover a date interval together.

Let's say I need to find all groups that cover the interval 20250201 - 20250230.

With the test data shown, I would expect to find groups 1, 2, 3 and 4.

(None of the rows of group 3 cover the interval by themselves. But they overlap and cover the interval together).

I have to solve this with SQL ONLY but I can use multiple selects.

I'm a SAP Dev and I try to solve this with a HANA AMDP Method / SQLScript but feel free to give me any SQL solution you can come up with.

My idea was to select all of the date intervals that overlap (not cover it completely) with my needed date interval. Then use window function LAG() to select the previous date intervals of the group and check if they overlap, or if there is a gap.

If they overlap, I would take the valid_from of the previous row to widen/enhance the interval. This way I was hoping to create a entry that has the maximum covered interval of the group.

Then I could make one more select and look for entries that cover the whole interval.

tmp_1 = 
SELECT
 group_id,
 CASE
  WHEN valid_from <= ( LAG ( valid_to )
                       OVER ( PARTITION BY group_id
                              ORDER BY group_id, valid_from, valid_to ) + 1 )
   THEN LAG ( valid_to )
        OVER ( PARTITION BY group_id
               ORDER BY group_id, valid_from, valid_to )
 END as valid_from_min,
 CASE
  WHEN valid_to < LAG ( valid_to )
                  OVER ( PARTITION BY group_id
                         ORDER BY group_id, valid_from, valid_to )
   THEN LAG ( valid_to )
        OVER ( PARTITION BY group_id
               ORDER BY group_id, valid_from, valid_to )
 END as valid_to_max
FROM DB_GROUP
WHERE valid_from <= 20250230
  AND valid_to   >= 20250201;

tmp_2 =
SELECT
 group_id
 FROM tmp_1
 WHERE valid_from_min <= 20250201
  AND  valid_to_max   >= 20250230
GROUP BY group_id;

This only works when a group has less than 3 entries.

Second row valid_from_min will equal valid_from of the first row - good.

Third row valid_from_min will equal valid_from of the second row - bad.

Guillaume Outters · Accepted Answer

LAG() won't work if let's say interval B is included in interval A, e.g A: 0101-0131, B: 0115-0118, C: 0120-0228: C's LAG() is B, but there's a hole between both, while, had C looked for A, it would have seen they overlap.

What I generally use is slice each interval into subintervals (cut on each start or end of an overlap with another interval), then ensure that at least one interval covers each slice of your period of interest.

So for your group 3:

group_id	valid_from	valid_to	1 → 12	13 → 15	16 → 20	21 → 30
3	20250201	20250215	X	X
3	20250213	20250220		X	X
3	20250221	20250230				X
3	20250221	20250230	1	2	1	1	← count() over the group

Small detail: we have to introduce our looked up interval's ends (0201 and 0230) into each group, so that in case a group's first interval starts on 0210 we count an empty slice from 0201 to 0209 (and get a row with a COUNT of 0 for this slice) instead of having no slice to count (and get no row for this slice, thus forget to signal it).
To alleviate our algorithm, we introduce it as an interval of the group, so that it gets sliced like any other interval of DB_GROUP.
Thus we finally won't look for a count of at least 1, but a count of at least 2 intervals (our reference one, and a true interval from DB_GROUP) to overlap over each slice.

WITH
  -- What period are we interested in?
  fromto AS
  (
    SELECT
      CAST('20250201' AS DATE) valid_from,
      CAST('20250228' AS DATE) valid_to
    FROM dummy
  ),
  -- Groups over that period:
  gr AS
  (
    SELECT
      group_id,
      GREATEST(i.valid_from, ft.valid_from) valid_from,
      LEAST(i.valid_to, ft.valid_to) valid_to
    FROM DB_GROUP i
    -- Just keep the values that overlap our period of interest:
    JOIN fromto ft ON i.valid_from <= ft.valid_to AND i.valid_to >= ft.valid_from
    -- Have our period of interest get a fictive presence in each group:
    UNION ALL
    SELECT DISTINCT group_id, ft.valid_from, ft.valid_to FROM fromto ft, DB_GROUP
  ),
  cuts AS
  (
    SELECT group_id, valid_from AS d FROM gr
    UNION -- Not UNION ALL: we want only one row for identical dates.
    SELECT group_id, adddays(valid_to, 1) FROM gr -- An interval ending on day D covers until the next interval starting on day D+1
  ),
  slices AS
  (
    SELECT gr.group_id, cuts.d, COUNT(1) noverlaps
    FROM cuts JOIN gr ON gr.group_id = cuts.group_id AND cuts.d >= valid_from AND cuts.d < valid_to
    GROUP BY gr.group_id, cuts.d
  )
SELECT group_id
FROM slices
GROUP BY group_id
HAVING MIN(noverlaps) > 1;

And we get the expected:

group_id
1
2
3
4

(with a PostgreSQL equivalent fiddle to see it run, add more corner cases, and select from the intermediate CTEs to get a glance into the steps)

Select groups of values that cover a date interval together

Tags:

sql

dateinterval

hana-sql-script

amdp

Aprikose

1 Answers

Guillaume Outters

Recent Activity

Donate For Us

Select groups of values that cover a date interval together

Tags:

sql

dateinterval

hana-sql-script

amdp

Aprikose

1 Answers

Guillaume Outters

Related questions

Recent Activity

Donate For Us