Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding median without sorting an array

Tags:

c

I am looking to implement a very simple function which finds the median of an unsorted array by counting the number of smaller elements and number of larger elements if they are equal in number then the original is considered as median.

I know several algorithm like minHeap and Quick Select, but I am trying to keep things simple as a human would do with a naked eye to simply count larger and smaller numbers. So far I have implemented below function but problem arise when I have duplicate entries in array and also with even and odd array length.

I am new to C programming and need to understand what is going wrong. Below is the code, I have written a function to return random array of variable length to test this function from.

int med(int count, int *array)
{
int i, j, median = -1, smaller = 0, larger = 0;

for(i = 0; i < count; i++)
{
    for(j = 0; j < count; j++)
    {
        //larger++

        if(array[i] < array[j] && i!=j)
        {
            larger++;
        }
        //Smaller++
        if(array[i] >= array[j] && i!=j)
        {
            smaller++;
        }
    }
    printf("\nFor pivot: %d", array[i]);
    if(larger == smaller)
    {
        printf("\n Smaller: %d", smaller);
        printf(" Larger: %d", larger);
        median = array[i];
        break;
    }
    else
    {
        printf("\n Smaller: %d", smaller);
        printf(" Larger: %d", larger);

        larger = 0;
        smaller = 0;
    }
}
return median;
}

In some cases like {3,5,0,2,3} my function returns -1 but the actual result should be 3.

EDIT Initially I started with strictly greater or lesser but this condition (larger == smaller) never gets hit when I have duplicate entries thus I considered equal elements as smaller. I am having difficulty handling the equalities

like image 459
CCPP Avatar asked Jan 26 '23 10:01

CCPP


1 Answers

B. Shefter found the bug for you. However, I still want to address the question.

I am looking to implement a very simple function which finds the median of an unsorted array by counting the number of smaller elements and number of larger elements if they are equal in number then the original is considered as median.

Only do this, if you can do it faster than O(nlog n), because that's the time complexity of qsort. I would recommend trying the median of medians algorithm. You can read about it here and here is the code from that site, but with comments removed:

int select(int *a, int s, int e, int k){
    if(e-s+1 <= 5){
        sort(a+s, a+e);
        return s+k-1;
    }
    
    for(int i=0; i<(e+1)/5; i++){
        int left = 5*i;
        int right = left + 4;
        if(right > e) right = e;
        int median = select(a, 5*i, 5*i+4, 3);
        swap(a[median], a[i]);
    }
    
    return select(a, 0, (e+1)/5, (e+1)/10);
}

I know several algorithm like using minHeap and Quick Select but I am trying to keep things simple as a human would do with a naked eye to simply count larger and smaller numbers.

While it is a good thing to keep things simple, make sure that that's what's your doing. The C standard library has a built in quick sort. If you use that one, the code can look like this:

int int_cmp(const void *a, const void *b) 
{ 
    const int ia = *(const int *)a; 
    const int ib = *(const int *)b;

    return ia-ib;
}

int med(int count, int *array)
{
    int tmp[count]; // You might want to use malloc instead

    memcpy(tmp, array, count * sizeof(*array));

    qsort(tmp, count, sizeof(tmp[0]), int_cmp);

    return tmp[count/2];
}

It is both faster and easier to read. Your code is O(n²) while this is O(nlog n).

You mentioned in a comment that you want to use this for a new sorting method. Then I want to mention that the median of sets with an odd number of elements typically is not a member of the set, so you need to alter your definition of median to suit your needs.

Here is an example of how you can achieve what you want in a pretty readable way, while still maintaining your idea. I start by adding a subproblem, which instead of "what is the median in the array" is "is x the median of the array". And then we ask that question for each element in the array until we have found the median.

int is_median(int x, int *array, int count) {
    int l=0, h=0;

    for(int i=0; i<count; i++) {
        if(array[i] < x) l++;
        else if(array[i] > x) h++;
    }
    
    if(h == l) return 1; // This is always a sufficient condition

    // Here you need to decide what to do. Just the above is not enough
    // for your purposes.
    else if(<condition>) return 1; 

    else return 0;
}

int med(int count, int *array) {
    for(int i = 0; i < count; i++) {
        if(is_median(array[i], array, count)) return array[i];
    }
    return 0; // This line should never be executed. It't only here
              // to suppress a warning.
}
    
like image 154
klutt Avatar answered Feb 16 '23 02:02

klutt