Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write Oracle query to find a total length of possible overlapping from-to dates

Tags:

sql

oracle

I'm struggling to find the query for the following task

I have the following data and want to find the total network day for each unique ID

ID  From        To          NetworkDay
1   03-Sep-12   07-Sep-12   5
1   03-Sep-12   04-Sep-12   2
1   05-Sep-12   06-Sep-12   2
1   06-Sep-12   12-Sep-12   5
1   31-Aug-12   04-Sep-12   3
2   04-Sep-12   06-Sep-12   3
2   11-Sep-12   13-Sep-12   3
2   05-Sep-12   08-Sep-12   3

Problem is the date range can be overlapping and I can't come up with SQL that will give me the following results

ID  From        To          NetworkDay
1   31-Aug-12   12-Sep-12   9
2   04-Sep-12   08-Sep-12   4
2   11-Sep-12   13-Sep-12   3

and then

ID  Total Network Day
1   9
2   7

In case the network day calculation is not possible just get to the second table would be sufficient.

Hope my question is clear

like image 221
Roby Avatar asked Sep 07 '12 09:09

Roby


People also ask

How do I find the difference between two dates and seconds in Oracle?

Answers. A DATE datatype - a DATE datatype yields a difference in DAYS so to get seconds from that you need to * 24 * 60 * 60 == seconds.

What is the range of dates valid in Oracle SQL?

Valid DATE Values. A valid DATE value must fall between January 1, 1000, and December 31, 9999. It must conform to one of three styles: numeric, packed numeric, or month name. You can mix these styles throughout a session.


1 Answers

We can use Oracle Analytics, namely the "OVER ... PARTITION BY" clause, in Oracle to do this. The PARTITION BY clause is kind of like a GROUP BY but without the aggregation part. That means we can group rows together (i.e. partition them) and them perform an operation on them as separate groups. As we operate on each row we can then access the columns of the previous row above. This is the feature PARTITION BY gives us. (PARTITION BY is not related to partitioning of a table for performance.)

So then how do we output the non-overlapping dates? We first order the query based on the (ID,DFROM) fields, then we use the ID field to make our partitions (row groups). We then test the previous row's TO value and the current rows FROM value for overlap using an expression like: (in pseudo code)

 max(previous.DTO, current.DFROM) as DFROM

This basic expression will return the original DFROM value if it doesnt overlap, but will return the previous TO value if there is overlap. Since our rows are ordered we only need to be concerned with the last row. In cases where a previous row completely overlaps the current row we want the row then to have a 'zero' date range. So we do the same thing for the DTO field to get:

max(previous.DTO, current.DFROM) as DFROM, max(previous.DTO, current.DTO) as DTO

Once we have generated the new results set with the adjusted DFROM and DTO values, we can aggregate them up and count the range intervals of DFROM and DTO.

Be aware that most date calculations in database are not inclusive such as your data is. So something like DATEDIFF(dto,dfrom) will not include the day dto actually refers to, so we will want to adjust dto up a day first.

I dont have access to an Oracle server anymore but I know this is possible with the Oracle Analytics. The query should go something like this: (Please update my post if you get this to work.)

SELECT id, 
    max(dfrom, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dfrom, 
    max(dto, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dto
  from (
    select id, dfrom, dto+1 as dto from my_sample   -- adjust the table so that dto becomes non-inclusive
      order by id, dfrom
  ) sample;

The secret here is the LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) expression which returns the value previous to the current row. So this query should output new dfrom/dto values which dont overlap. It's then a simple matter of sub-querying this doing (dto-dfrom) and sum the totals.

Using MySQL

I did haves access to a mysql server so I did get it working there. MySQL doesnt have results partitioning (Analytics) like Oracle so we have to use result set variables. This means we use @var:=xxx type expressions to remember the last date value and adjust the dfrom/dto according. Same algorithm just a little longer and more complex syntax. We also have to forget the last date value any time the ID field changes!

So here is the sample table (same values you have):

create table sample(id int, dfrom date, dto date, networkDay int);
insert into sample values
    (1,'2012-09-03','2012-09-07',5),
    (1,'2012-09-03','2012-09-04',2),
    (1,'2012-09-05','2012-09-06',2),
    (1,'2012-09-06','2012-09-12',5),
    (1,'2012-08-31','2012-09-04',3),
    (2,'2012-09-04','2012-09-06',3),
    (2,'2012-09-11','2012-09-13',3),
    (2,'2012-09-05','2012-09-08',3);

On to the query, we output the un-grouped result set like above: The variable @ld is "last date", and the variable @lid is "last id". Anytime @lid changes, we reset @ld to null. FYI In mysql the := operators is where the assignment happens, an = operator is just equals.

This is a 3 level query, but it could be reduced to 2. I went with an extra outer query to keep things more readable. The inner most query is simple and it adjusts the dto column to be non-inclusive and does the proper row ordering. The middle query does the adjustment of the dfrom/dto values to make them non-overlapped. The outer query simple drops the non-used fields, and calculate the interval range.

set @ldt=null, @lid=null;
select id, no_dfrom as dfrom, no_dto as dto, datediff(no_dto, no_dfrom) as days from (
select if(@lid=id,@ldt,@ldt:=null) as last, dfrom, dto, if(@ldt>=dfrom,@ldt,dfrom) as no_dfrom, if(@ldt>=dto,@ldt,dto) as no_dto, @ldt:=if(@ldt>=dto,@ldt,dto), @lid:=id as id,
        datediff(dto, dfrom) as overlapped_days
  from (select id, dfrom, dto + INTERVAL 1 DAY as dto from sample order by id, dfrom) as sample
  ) as nonoverlapped
  order by id, dfrom;

The above query gives the results (notice dfrom/dto are non-overlapping here):

+------+------------+------------+------+
| id   | dfrom      | dto        | days |
+------+------------+------------+------+
|    1 | 2012-08-31 | 2012-09-05 |    5 |
|    1 | 2012-09-05 | 2012-09-08 |    3 |
|    1 | 2012-09-08 | 2012-09-08 |    0 |
|    1 | 2012-09-08 | 2012-09-08 |    0 |
|    1 | 2012-09-08 | 2012-09-13 |    5 |
|    2 | 2012-09-04 | 2012-09-07 |    3 |
|    2 | 2012-09-07 | 2012-09-09 |    2 |
|    2 | 2012-09-11 | 2012-09-14 |    3 |
+------+------------+------------+------+
like image 112
guru_florida Avatar answered Nov 09 '22 23:11

guru_florida