I have a road_events
table:
create table road_events ( event_id number(4,0), road_id number(4,0), year number(4,0), from_meas number(10,2), to_meas number(10,2), total_road_length number(10,2) ); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (1,1,2020,25,50,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (2,1,2000,25,50,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (3,1,1980,0,25,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (4,1,1960,75,100,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (5,1,1940,1,100,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (6,2,2000,10,30,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (7,2,1975,30,60,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (8,2,1950,50,90,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (9,3,2050,40,90,100); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (10,4,2040,0,200,200); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (11,4,2013,0,199,200); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (12,4,2001,0,200,200); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (13,5,1985,50,70,300); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (14,5,1985,10,50,300); insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (15,5,1965,1,301,300); commit; select * from road_events;
EVENT_ID ROAD_ID YEAR FROM_MEAS TO_MEAS TOTAL_ROAD_LENGTH ---------- ---------- ---------- ---------- ---------- ----------------- 1 1 2020 25 50 100 2 1 2000 25 50 100 3 1 1980 0 25 100 4 1 1960 75 100 100 5 1 1940 1 100 100 6 2 2000 10 30 100 7 2 1975 30 60 100 8 2 1950 50 90 100 9 3 2050 40 90 100 10 4 2040 0 200 200 11 4 2013 0 199 200 12 4 2001 0 200 200 13 5 1985 50 70 300 14 5 1985 10 50 300 15 5 1965 1 301 300
I want to select the events that represent the most recent work on each road.
This is a tricky operation, because the events often pertain to only a portion of the road. This means that I can't simply select the most recent event per road; I need to only select the most recent event mileage that doesn't overlap.
Possible logic (in order):
I'm reluctant to guess at how this problem could be solved, because it could end up hurting more than it helps (kind of like the XY Problem). On the other hand, it might provide insight into the nature of the problem, so here it goes:
event A
.event A
is >= total_road_length
, then that's all I need. The algorithm ends here.event B
) that does not have the same extents as event A
. event B
overlap the extents of event A
, then only get the portion(s) of event B
that do not overlap. = total_road_length
. Or stop when there are no more events for that road.Question:
I know it's a tall order, but what would it take to do this?
This is a classic linear referencing problem. It would be extremely helpful if I could do linear referencing operations as part of queries.
The result would be:
EVENT_ID ROAD_ID YEAR TOTAL_ROAD_LENGTH EVENT_LENGTH ---------- ---------- ---------- ----------------- ------------ 1 1 2020 100 25 3 1 1980 100 25 4 1 1960 100 25 5 1 1940 100 25 6 2 2000 100 20 7 2 1975 100 30 8 2 1950 100 30 9 3 2050 100 50 10 4 2040 200 200 13 5 1985 300 20 14 5 1985 300 40 15 5 1965 300 240
Related question: Select where number range does not overlap
My main DBMS is Teradata, but this will work as-is in Oracle, too.
WITH all_meas AS ( -- get a distinct list of all from/to points SELECT road_id, from_meas AS meas FROM road_events UNION SELECT road_id, to_meas FROM road_events ) -- select * from all_meas order by 1,2 , all_ranges AS ( -- create from/to ranges SELECT road_id, meas AS from_meas ,Lead(meas) Over (PARTITION BY road_id ORDER BY meas) AS to_meas FROM all_meas ) -- SELECT * from all_ranges order by 1,2 , all_event_ranges AS ( -- now match the ranges to the event ranges SELECT ar.* ,re.event_id ,re.year ,re.total_road_length ,ar.to_meas - ar.from_meas AS event_length -- used to filter the latest event as multiple events might cover the same range ,Row_Number() Over (PARTITION BY ar.road_id, ar.from_meas ORDER BY year DESC) AS rn FROM all_ranges ar JOIN road_events re ON ar.road_id = re.road_id AND ar.from_meas < re.to_meas AND ar.to_meas > re.from_meas WHERE ar.to_meas IS NOT NULL ) SELECT event_id, road_id, year, total_road_length, Sum(event_length) FROM all_event_ranges WHERE rn = 1 -- latest year only GROUP BY event_id, road_id, year, total_road_length ORDER BY road_id, year DESC;
If you need to return the actual covered from/to_meas
(as in your question before edit), it might be more complicated. The first part is the same, but without aggregation the query can return adjacent rows with the same event_id (e.g. for event 3: 0-1 & 1-25):
SELECT * FROM all_event_ranges WHERE rn = 1 ORDER BY road_id, from_meas;
If you want to merge adjacent rows you need two more steps (using a standard approach, flag the 1st row of a group and calculate a group number):
WITH all_meas AS ( SELECT road_id, from_meas AS meas FROM road_events UNION SELECT road_id, to_meas FROM road_events ) -- select * from all_meas order by 1,2 , all_ranges AS ( SELECT road_id, meas AS from_meas ,Lead(meas) Over (PARTITION BY road_id ORDER BY meas) AS to_meas FROM all_meas ) -- SELECT * from all_ranges order by 1,2 , all_event_ranges AS ( SELECT ar.* ,re.event_id ,re.year ,re.total_road_length ,ar.to_meas - ar.from_meas AS event_length ,Row_Number() Over (PARTITION BY ar.road_id, ar.from_meas ORDER BY year DESC) AS rn FROM all_ranges ar JOIN road_events re ON ar.road_id = re.road_id AND ar.from_meas < re.to_meas AND ar.to_meas > re.from_meas WHERE ar.to_meas IS NOT NULL ) -- SELECT * FROM all_event_ranges WHERE rn = 1 ORDER BY road_id, from_meas , adjacent_events AS ( -- assign 1 to the 1st row of an event SELECT t.* ,CASE WHEN Lag(event_id) Over(PARTITION BY road_id ORDER BY from_meas) = event_id THEN 0 ELSE 1 END AS flag FROM all_event_ranges t WHERE rn = 1 ) -- SELECT * FROM adjacent_events ORDER BY road_id, from_meas , grouped_events AS ( -- assign a groupnumber to adjacent rows using a Cumulative Sum over 0/1 SELECT t.* ,Sum(flag) Over (PARTITION BY road_id ORDER BY from_meas ROWS Unbounded Preceding) AS grp FROM adjacent_events t ) -- SELECT * FROM grouped_events ORDER BY road_id, from_meas SELECT event_id, road_id, year, Min(from_meas), Max(to_meas), total_road_length, Sum(event_length) FROM grouped_events GROUP BY event_id, road_id, grp, year, total_road_length ORDER BY 2, Min(from_meas);
Edit:
Ups, I just found a blog Overlapping ranges with priority doing exactly the same with some simplified Oracle syntax. In fact I translated my query from a some other simplified syntax in Teradata to Standard/Oracle SQL :-)
There is another way to calculate this, with from and to values:
with part_begin_point as ( Select distinct road_id, from_meas point from road_events be union Select distinct road_id, to_meas point from road_events ee ) , newest_part as ( select e.event_id , e.road_id , e.year , e.total_road_length , p.point , LAG(e.event_id) over (partition by p.road_id order by p.point) prev_event , e.to_meas event_to_meas from part_begin_point p join road_events e on p.road_id = e.road_id and p.point >= e.from_meas and p.point < e.to_meas and not exists( select 1 from road_events ne where e.road_id = ne.road_id and p.point >= ne.from_meas and p.point < ne.to_meas and (e.year < ne.year or e.year = ne.year and e.event_id < ne.event_id)) ) select event_id, road_id, year , point from_meas , LEAD(point, 1, event_to_meas) over (partition by road_id order by point) to_meas , total_road_length , LEAD(point, 1, event_to_meas) over (partition by road_id order by point) - point EVENT_LENGTH from newest_part where 1=1 and event_id <> prev_event or prev_event is null order by event_id, point
SQL Fiddle
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With