Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for finding the busiest period?

Tags:

I have some data like this:

1: 2 - 10 2: 3 - 15 3: 4 - 9 4: 8 - 14 5: 7 - 13 6: 5 - 10 7: 11 - 15 

I will attempt a representation to make it clearer:

        1     2     3     4     5     6     7     8     9     10     11     12     13     14     15 1             |--------------------------------------X---------| 2                   |--------------------------------X--------------------------------------------| 3                         |--------------------------X---| 4                                                  |-X-------------------------------------| 5                                           |--------X------------------------------| 6                               |--------------------X----------| 7                                                                     |---------------------------| 

So in the example case, 8-9 is the critical period if the second scheme is used because all the points are active. What is a fast and good way to solving this problem in python? I am thinking of using dynamic programming but are there other approaches that are suggested?

My approach until now:

I was thinking more from a real-time perspective. So, whenever I get a new point, I do this: Assume I already got 2-10 and I get 3-15 then I pick the max of start and min of end so this case it is 3-10 and increment this interval's count to 2. Then the third point comes in 4-9, pick the max which is 4 and the min is 9 and update the value 3-10 to 4-9 and update count to 3. Now when 8-14 comes in, I pick the start of this interval is greater than 4-9 and the end of this interval is less than 4-9. In this case, it is not true so I will create a new bucket 8-14 and I put the count to 1. This is not the entire algorithm but should give a high-level idea of what I am doing here. I will see if I can sketch the pseudo-code.

like image 446
Legend Avatar asked Apr 24 '11 04:04

Legend


Video Answer


1 Answers

        1     2     3     4     5     6     7     8     9     10     11     12     13     14     15 1             |--------------------------------------X---------| 2                   |--------------------------------X--------------------------------------------| 3                         |--------------------------X---| 4                                                  |-X-------------------------------------| 5                                           |--------X------------------------------| 6                               |--------------------X----------| 7                                                                     |---------------------------|               +1    +1     +1   +1           +1     +1    -1    -2     +1           -1     -1     -2               1     2     3     4           5       6    5      3     4             3      2      0                                                      ^^^^ 

Get it?

So you need to transform this:

1: 2 - 10 2: 3 - 15 3: 4 - 9 4: 8 - 14 5: 7 - 13 6: 5 - 10 7: 11 - 15 

into:

[(2,+), (3,+), (4,+), (5,+), (7,+), (8,+), (9,-), (10,-), (10,-), (11,+), (13,-), (14,-), (15,-), (15,-)] 

and then you simply iterate through, counting up when you see a + and counting down on -. The busiest interval will be when the count is maximum.

So in code:

intervals = [(2, 10), (3, 15), (4, 9), (8, 14), (7, 13), (5, 10), (11, 15)] intqueue = sorted([(x[0], +1) for x in intervals] + [(x[1], -1) for x in intervals]) rsum = [(0,0)] for x in intqueue:      rsum.append((x[0], rsum[-1][1] + x[1])) busiest_start = max(rsum, key=lambda x: x[1]) # busiest_end = the next element in rsum after busiest_start   # instead of using lambda, alternatively you can do: #     def second_element(x): #         return x[1] #     busiest_start = max(rsum, key=second_element) # or: #     import operator #     busiest_start = max(rsum, key=operator.itemgetter(1)) 

runtime complexity is (n+n)*log(n+n)+n+n or O(n*log(n))

It is also possible to convert this idea into an online algorithm if you don't have the complete list of intervals at the start of the program but is guaranteed that incoming intervals will never be scheduled for a past point. Instead of sorting you will use a priority queue, each time an interval comes, you push in two items, the start point and the end point, each with a +1 and -1 respectively. And then you pop off and count and keep track of the peak hour.

like image 90
Lie Ryan Avatar answered Dec 17 '22 19:12

Lie Ryan