Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

interviewstreet median challenge

Problem The median of M numbers is defined as the 1) if M is odd middle number after sorting them in order 2) if M is even the average number of the middle 2 numbers (again after sorting) You have an empty number list at first. Then you can add or remove some number from the list. For each add or remove operation, output the median of numbers in the list.

Example : For a set of m = 5 numbers, { 9, 2, 8, 4, 1 } the median is the third number in sorted set { 1, 2, 4, 8, 9 } which is 4. Similarly for set of m = 4, { 5, 2, 10, 4 }, the median is the average of second and the third element in the sorted set { 2, 4, 5, 10 } which is (4+5)/2 = 4.5

My approach I think the problem can be solved in this way.. Idea is to use previous median value and pointer to find new median value instead of recalculating at every add or remove operation.

1) Use multisets which always keep elements in order and allow duplicates. In other words maintain sorted list somehow.

2) If the operation is add

2.1) Insert this element into set and then calculate the median

2.2) if the size of set is 1 then first element will be the median

2.3) if the size of set is even, then

           if new element is larger then prev median, new median will be avg of prev median

               and the next element in set.

           else new median will be avg of prev median and previous of prev element in the set.

2.4) if the size is odd, then

          if new element is larger then prev median

                 if also less then 2nd element of prev median ( 2nd element used to calculate avg

                    of prev median) then this new element to be added will be new median

                 else median will be 2nd element use to calculate the avg during last iteration prev

                    median.

          else

                 new median will be previous of prev median element in the set

3) If the operation is remove

3.1) First calculate the new median

3.2) If the size of set is 0 can't remove

3.3) If the size is 1 if the first element is the element to be removed, remove it else can't remove.

3.4) If the size of set is even, then

           if the element to be deleted is greater than or equal to 2nd element of prev median, then

               1st element of prev median will be new median

          else 2nd element of prev median will be the new median

3.5) If the size of set is odd, then

           if the element to be deleted is the prev median then find the avg of its prev and  next element.

           else if the element to be deleted is greater then prev median, new median will be avg of prev median and previous to prev median

           else median will be avg of prev median and next element to prev median.

3.6) Remove the element. 

Here is the working code ...http://justprogrammng.blogspot.com/2012/06/interviewstreet-median-challenge.html. What are your views on this approach?

like image 592
sachin Avatar asked Jun 13 '12 02:06

sachin


2 Answers

Your approach seems like it could work, but from the description and the code, you can tell that there is a lot of casework involved. I wouldn't like to be the one having to debug that! So let me give you an alternate solution that should involve less cases, and therefore be much simpler to get right.

Keep two multisets (this algorithm also works with two priority queues, as we're only going to look at the extremes of each one). The first, minset, is going to keep the smallest n/2 numbers, and the second, maxset, is going to store the last n/2 numbers.

Whenever you add a number:

  • If it is greater than max(minset), add it to maxset
  • Otherwise, add it to minset

Note that this doesn't guarantee the n/2 condition. Therefore, we should add one extra "fixing" step:

  • If maxset.size() > minset.size(), remove the smallest element from maxset and insert it to minset.
  • If minset.size() > minset.size() + 1, remove the biggest element from minset and insert it to maxset.

After this is done, we just have to get the median. This should be really easy to do with our data structure: depending on whether the current n is even or odd, it's either max(minset) or the average between max(minset) and min(maxset).

For the removal operation, just try to remove it from any of the sets and do the fixing afterwards.

like image 122
ffao Avatar answered Nov 11 '22 20:11

ffao


The main issue with your code is the comparison of each new item with the running median, which might be a calculated average value. Instead you should compare the new item with the value at the previous middle (*prev in your code). At it is, after receiving the sequence of 1 and 5, your median value will be 3. If the next value is 2 or 4 it should become the new median, but since your code follows a different path for each of those, one of the results is wrong.

It would be simpler overall to just keep track of the middle location and not the running median. Instead, calculate the median at the end of each add/remove operation:

if size == 0
    median = NaN
else if size is odd
    median = *prev
else
    median = (*prev + *(prev-1)) / 2
like image 1
xan Avatar answered Nov 11 '22 19:11

xan