Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maximum Count Range Intersection (in T-SQL)

Let's say I have a table with a bunch of dates, e.g.:

declare @tbl table {
    idx int primary key,
    startdate datetime,
    enddate datetime
}

And I want to find the largest set of rows where startdate and enddate intersect (in the real world, the start date and end date represents start and end times for events, and I need to find the maximum # of events occurring simultaneously).

In another programming language I might sort all entries by startdate, then iterate through each entry once, building a temporary set of intersections (keeping track of the largest set generated). But I'm not sure if this is the most efficient way to express this in T-SQL. help!

Oh, and it's SQL Server 2000. :(

like image 645
Jen A Avatar asked Feb 22 '26 08:02

Jen A


1 Answers

Updated to remove the union all

declare @tbl table (
idx int identity(1,1) primary key,    
startdate datetime,    
enddate datetime);

insert into @tbl (startdate, enddate) 
select '2009-01-01', '2009-01-05'
union all select '2009-01-02', '2009-01-04'
union all select '2009-01-01', '2009-01-03'
union all select '2009-01-03', '2009-01-06'
union all select '2009-01-04', '2009-01-07'
union all select '2009-01-05', '2009-01-08'

select idx, startdate
   , (select sum(in_or_out) 
from (
   select case when startdate<=all_events.startdate then 1 else 0 end
     + case when enddate <= all_events.startdate then -1 else 0 end as in_or_out
   from @tbl 
   where startdate <= all_events.startdate
     or enddate <= all_events.startdate) as previous
) as concurent
from @tbl all_events
order by startdate

This gives the timeline of start session, with the count of concurent sessions at the moment new session starts:

idx startdate   concurent
3   2009-01-01 00:00:00.000 2
1   2009-01-01 00:00:00.000 2
2   2009-01-02 00:00:00.000 3
4   2009-01-03 00:00:00.000 3
5   2009-01-04 00:00:00.000 3
6   2009-01-05 00:00:00.000 3

To get the original request (set of concurent sessions with max concurency) you need to run this query twice, once to get the max concurent sessions and once to get the start dates of the sessions that have max concurent times, then you must get those sessions.

Updated

OK, so here the one single query that retrieves the max concurent sessions. I changed the test data to remove ambibuos overlaps of end and start:

declare @tbl table (
idx int identity(1,1) primary key,    
startdate datetime,    
enddate datetime);

insert into @tbl (startdate, enddate) 
select '2009-01-01', '2009-01-04 23:59:59'
union all select '2009-01-02', '2009-01-03 23:59:59'
union all select '2009-01-01', '2009-01-02 23:59:59'
union all select '2009-01-03', '2009-01-03 23:59:59'
union all select '2009-01-04', '2009-01-04 23:59:59'
union all select '2009-01-05', '2009-01-05 23:59:59'


select max_concurent_starts.startdate as concurentdate
  , session.*
from (
  select *
  ,(
        select sum(in_or_out) 
        from (
            select case when startdate<=all_events.startdate then 1 else 0 end
                + case when enddate <= all_events.startdate then -1 else 0 end 
                as in_or_out
          from @tbl 
          where startdate <= all_events.startdate
              or enddate <= all_events.startdate) as previous
    ) as concurent
  from @tbl all_events) as max_concurent_starts
  join @tbl as session 
     on session.startdate <= max_concurent_starts.startdate 
     and session.enddate >= max_concurent_starts.startdate
  where concurent = (
  select top 1 concurent
  from (
      select (
          select sum(in_or_out) 
          from (
              select case when startdate<=all_events.startdate then 1 else 0 end
                  + case when enddate <= all_events.startdate then -1 else 0 end 
                  as in_or_out
            from @tbl 
            where startdate <= all_events.startdate
                or enddate <= all_events.startdate) as previous
      ) as concurent
    from @tbl all_events) as all_events_with_concurent
    order by concurent desc)
  order by concurentdate, startdate;

This gives a result like:

concurentdate   idx startdate   enddate
2009-01-02 00:00:00.000 3   2009-01-01 00:00:00.000 2009-01-02 23:59:59.000
2009-01-02 00:00:00.000 1   2009-01-01 00:00:00.000 2009-01-04 23:59:59.000
2009-01-02 00:00:00.000 2   2009-01-02 00:00:00.000 2009-01-03 23:59:59.000
2009-01-03 00:00:00.000 1   2009-01-01 00:00:00.000 2009-01-04 23:59:59.000
2009-01-03 00:00:00.000 2   2009-01-02 00:00:00.000 2009-01-03 23:59:59.000
2009-01-03 00:00:00.000 4   2009-01-03 00:00:00.000 2009-01-03 23:59:59.000

which reads as follows: on 2009-01-02 00:00:00 there were 3 concurent sessions (3, 1 and 2) with they respective starts and ends. There is a tie, on 2009-01-03 00:00:00 there were also 3 concurent sessions (1, 2 and 4) with their respective starts and ends.

Performance milage may vary. The query can be written 1 million times simpler in SQL 2005 using CTEs.

like image 127
Remus Rusanu Avatar answered Feb 24 '26 00:02

Remus Rusanu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!