Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate Overlapping Segments to Measure Effective Length

Tags:

I have a road_events table:

create table road_events (     event_id number(4,0),     road_id number(4,0),     year number(4,0),     from_meas number(10,2),     to_meas number(10,2),     total_road_length number(10,2)     );  insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (1,1,2020,25,50,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (2,1,2000,25,50,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (3,1,1980,0,25,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (4,1,1960,75,100,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (5,1,1940,1,100,100);  insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (6,2,2000,10,30,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (7,2,1975,30,60,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (8,2,1950,50,90,100);  insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (9,3,2050,40,90,100);  insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (10,4,2040,0,200,200); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (11,4,2013,0,199,200); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (12,4,2001,0,200,200);  insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (13,5,1985,50,70,300); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (14,5,1985,10,50,300); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (15,5,1965,1,301,300); commit;  select * from road_events; 

  EVENT_ID    ROAD_ID       YEAR  FROM_MEAS    TO_MEAS TOTAL_ROAD_LENGTH ---------- ---------- ---------- ---------- ---------- -----------------          1          1       2020         25         50               100          2          1       2000         25         50               100          3          1       1980          0         25               100          4          1       1960         75        100               100          5          1       1940          1        100               100           6          2       2000         10         30               100          7          2       1975         30         60               100          8          2       1950         50         90               100           9          3       2050         40         90               100          10          4       2040          0        200               200         11          4       2013          0        199               200         12          4       2001          0        200               200          13          5       1985         50         70               300         14          5       1985         10         50               300         15          5       1965          1        301               300 

I want to select the events that represent the most recent work on each road.

This is a tricky operation, because the events often pertain to only a portion of the road. This means that I can't simply select the most recent event per road; I need to only select the most recent event mileage that doesn't overlap.


Possible logic (in order):

I'm reluctant to guess at how this problem could be solved, because it could end up hurting more than it helps (kind of like the XY Problem). On the other hand, it might provide insight into the nature of the problem, so here it goes:

  1. Select the most recent event for each road. We'll call the most recent event: event A.
  2. If event A is >= total_road_length, then that's all I need. The algorithm ends here.
  3. Else, get the next chronological event (event B) that does not have the same extents as event A.
  4. If the extents of event B overlap the extents of event A, then only get the portion(s) of event B that do not overlap.
  5. Repeat steps 3 and 4 until the total event length is = total_road_length. Or stop when there are no more events for that road.

Question:

I know it's a tall order, but what would it take to do this?

This is a classic linear referencing problem. It would be extremely helpful if I could do linear referencing operations as part of queries.

The result would be:

  EVENT_ID    ROAD_ID       YEAR  TOTAL_ROAD_LENGTH   EVENT_LENGTH ---------- ---------- ----------  -----------------   ------------          1          1       2020                100             25          3          1       1980                100             25          4          1       1960                100             25          5          1       1940                100             25           6          2       2000                100             20          7          2       1975                100             30          8          2       1950                100             30           9          3       2050                100             50          10          4       2040                200            200          13          5       1985                300             20         14          5       1985                300             40         15          5       1965                300            240 

Related question: Select where number range does not overlap

like image 579
User1974 Avatar asked Aug 29 '18 15:08

User1974


2 Answers

My main DBMS is Teradata, but this will work as-is in Oracle, too.

WITH all_meas AS  ( -- get a distinct list of all from/to points    SELECT road_id, from_meas AS meas    FROM road_events    UNION    SELECT road_id, to_meas    FROM road_events  ) -- select * from all_meas order by 1,2  , all_ranges AS  ( -- create from/to ranges    SELECT road_id, meas AS from_meas       ,Lead(meas)       Over (PARTITION BY road_id             ORDER BY meas) AS to_meas    FROM all_meas   )  -- SELECT * from all_ranges order by 1,2 , all_event_ranges AS  ( -- now match the ranges to the event ranges    SELECT        ar.*      ,re.event_id      ,re.year      ,re.total_road_length      ,ar.to_meas - ar.from_meas AS event_length      -- used to filter the latest event as multiple events might cover the same range       ,Row_Number()       Over (PARTITION BY ar.road_id, ar.from_meas             ORDER BY year DESC) AS rn    FROM all_ranges ar    JOIN road_events re      ON ar.road_id = re.road_id     AND ar.from_meas < re.to_meas     AND ar.to_meas > re.from_meas    WHERE ar.to_meas IS NOT NULL  ) SELECT event_id, road_id, year, total_road_length, Sum(event_length) FROM all_event_ranges WHERE rn = 1 -- latest year only GROUP BY event_id, road_id, year, total_road_length ORDER BY road_id, year DESC; 

If you need to return the actual covered from/to_meas (as in your question before edit), it might be more complicated. The first part is the same, but without aggregation the query can return adjacent rows with the same event_id (e.g. for event 3: 0-1 & 1-25):

SELECT * FROM all_event_ranges WHERE rn = 1 ORDER BY road_id, from_meas; 

If you want to merge adjacent rows you need two more steps (using a standard approach, flag the 1st row of a group and calculate a group number):

WITH all_meas AS  (    SELECT road_id, from_meas AS meas    FROM road_events    UNION    SELECT road_id, to_meas    FROM road_events  ) -- select * from all_meas order by 1,2  , all_ranges AS  (     SELECT road_id, meas AS from_meas       ,Lead(meas)       Over (PARTITION BY road_id             ORDER BY meas) AS to_meas    FROM all_meas   ) -- SELECT * from all_ranges order by 1,2 , all_event_ranges AS  (    SELECT        ar.*      ,re.event_id      ,re.year      ,re.total_road_length      ,ar.to_meas - ar.from_meas AS event_length      ,Row_Number()       Over (PARTITION BY ar.road_id, ar.from_meas             ORDER BY year DESC) AS rn    FROM all_ranges ar    JOIN road_events  re      ON ar.road_id = re.road_id     AND ar.from_meas < re.to_meas     AND ar.to_meas > re.from_meas    WHERE ar.to_meas IS NOT NULL  ) -- SELECT * FROM all_event_ranges WHERE rn = 1 ORDER BY road_id, from_meas , adjacent_events AS   ( -- assign 1 to the 1st row of an event    SELECT t.*      ,CASE WHEN Lag(event_id)                 Over(PARTITION BY road_id                      ORDER BY from_meas) = event_id            THEN 0             ELSE 1        END AS flag    FROM all_event_ranges t    WHERE rn = 1  ) -- SELECT * FROM adjacent_events ORDER BY road_id, from_meas  , grouped_events AS  ( -- assign a groupnumber to adjacent rows using a Cumulative Sum over 0/1    SELECT t.*      ,Sum(flag)       Over (PARTITION BY road_id             ORDER BY from_meas             ROWS Unbounded Preceding) AS grp    FROM adjacent_events t ) -- SELECT * FROM grouped_events ORDER BY  road_id, from_meas SELECT event_id, road_id, year, Min(from_meas), Max(to_meas), total_road_length, Sum(event_length) FROM grouped_events GROUP BY event_id, road_id, grp, year, total_road_length ORDER BY 2, Min(from_meas); 

Edit:

Ups, I just found a blog Overlapping ranges with priority doing exactly the same with some simplified Oracle syntax. In fact I translated my query from a some other simplified syntax in Teradata to Standard/Oracle SQL :-)

like image 84
dnoeth Avatar answered Sep 24 '22 08:09

dnoeth


There is another way to calculate this, with from and to values:

with    part_begin_point as (     Select distinct road_id, from_meas point     from road_events be     union      Select distinct road_id, to_meas point     from road_events ee   ) , newest_part as (   select e.event_id   , e.road_id   , e.year   , e.total_road_length   , p.point   , LAG(e.event_id) over (partition by p.road_id order by p.point) prev_event   , e.to_meas event_to_meas   from part_begin_point p   join road_events e    on p.road_id = e.road_id    and p.point >= e.from_meas and  p.point < e.to_meas    and not exists(         select 1 from road_events ne          where e.road_id = ne.road_id         and p.point >= ne.from_meas and p.point < ne.to_meas         and (e.year < ne.year or e.year = ne.year and e.event_id < ne.event_id))   ) select event_id, road_id, year , point from_meas , LEAD(point, 1, event_to_meas) over (partition by road_id order by point) to_meas , total_road_length , LEAD(point, 1, event_to_meas) over (partition by road_id order by point) - point EVENT_LENGTH from newest_part where 1=1 and event_id <> prev_event or prev_event is null order by event_id, point 

SQL Fiddle

like image 20
Adam Silenko Avatar answered Sep 22 '22 08:09

Adam Silenko