Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partially sorting an array C

Tags:

arrays

c

qsort

I have an array which looks like this:

int array[] = {4.53, 3.65, 7.43, 9.54, 0.72, 0.0}

I am just wondering what method I can use to partially sort this array to bring the top three biggest doubles to the front. I am looking for the most efficient method to get the top three highest numbers in this array.

So far I have been using qsort, but I am just looking for another method to do this which could be even faster. I know that qsort is O(nlogn) for best cases and O(n^2) for worst cases, but is there an even more efficient method to achieve this problem? What I mean by efficient is just a faster way to do it, better than O(nlogn).

Any help would be great

like image 885
RoadRunner Avatar asked Oct 15 '16 17:10

RoadRunner


People also ask

How do you partially sort an array?

Arrays. sort() method can be used to sort a subset of the array elements in Java. This method has three arguments i.e. the array to be sorted, the index of the first element of the subset (included in the sorted elements) and the index of the last element of the subset (excluded from the sorted elements).

What is an almost sorted array?

The "almost sorted" array is generated by starting with an int array of size n containing the sequence 0, 1, 2, ..., n -1 and perturbing it slightly. To perturb the array, pick at random 10 pairs of indices (in the range 0 through n -1) and for each pair of indices ( i , j ), swap the elements in slots i and j .

Can array be sorted in descending order?

Array elements can be sorted in descending order by passing in the array and Collections. reverseOrder() as parameters to Arrays.


2 Answers

Simply maintain first, second, third.

   first =  array[0];
   second = array[1];
   third = array[2];

   /* scratch sort for three elements */
   if(first < second)
     swap(first, second);
  if(first < third)
     swap(first, third);
  if(second < third)
     swap(second, third);

  /* now go through, bubbling up if we have a hit */ 
  for(i=3;i<N;i++)
  {
      if(third < array[i])
      {
         third = array[i];
         if(second < third)
         {
            swap(second, third);
            if(first < second)
              swap(first, second);
         }
      }
  }     

I wouldn't try to scale up to k = four. I think three is about the limit for hardcoding it. As k get large you need to move to a formal method.

This doesn't answer the question you actually asked, which was how to partially sort, but it seems to be what you want.

If you wish to partially sort, you can use quicksort, and simply return early when the pivot goes above the bound you are interested it. So our first pivot divides into five, two. Ignore the last two, and only actually do the sub-sorts of the last five. But whilst it will be faster than quicksort, it won't be a game changer. If you can get a conservative upper bound on the k'th item (eg it's always going to be at most 25% between the minimum and the mean) you can quickly eliminate most of the data. If you get it wrong it's just another pass or two.

Using the quicksort method

  int sortfirstk_r(int *array, int N, int k)
  {
     int pivot = 0;
     int j = n -1;
     int i = 1;

     while(i <= j)
     {
        if(array[pivot] < array[i])
          swap(array[i], array[j--])
        else
          i++;

     }
     sortfirstk_r(array, i, k < i ? k : i);
     if(i < k)
       sortfirstk_r(array +i, N -i, k - i); 

  }

(Untested and there might be bugs in the slightly tricky sort logic).

However we've naively used the first element as the pivot. If we're sorting a large data set, and it has a normal distribution, and we want the top 1%, the z-score is 2.326. Take a bit more to allow us some sampling error, and we make a first pass with a pivot set at say 2.3 standard deviations above the mean. Then we split the distribution into two sets, the top 1% plus a bit, and the rest. We don't need to further process the rest, and just sort the top group.

like image 148
Malcolm McLean Avatar answered Sep 27 '22 23:09

Malcolm McLean


For your specific problem the quickest method is to do something similar to below since you only want three elements: (It may be quicker to use a priority queue or a different data structure, but the speed will not be very noticeable)

#include"stdio.h"
void moveThreeMaxToFront(double * arr, int length);
void moveMaxToFront(double*arr, int length);
int main() {
  int i;
  double meh[]={ 5,3,1,7,2,9,11};
  moveThreeMaxToFront(meh, 7);
  for(i=0; i<7; i++)
    printf("%f \n", meh[i]);
}
void moveThreeMaxToFront(double * arr, int length) {
  for(int i=0; i<3; i++)
    moveMaxToFront(arr++, length-i);
}
void moveMaxToFront(double* arr, int length) {
  int i;
  for(i=1; i<length; i++) {
    if(arr[i]>arr[0]) {
      double tmp=arr[i];
      arr[i]=arr[0];
      arr[0]=tmp;
    }
  }
}

However, it is potentially faster if k becomes substantially larger to either implement Quickselect or use the partial_sort method which I believe implements quickselect. However, the quickselect algorithm for the given case has an average constant of approximately 3.4-4.4 which is slightly larger than the constant above(3). Please also note that quickselect has an average run time of O(n). This run time can be guaranteed using median of 3, but this is not advised as it significantly increases the average constant. Intro-select properly handles this to prevent the worst case of quickselect while retaining its average case.

like image 39
Edward Jezisek Avatar answered Sep 28 '22 00:09

Edward Jezisek