Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dividing the elements of an array in 3 groups

I have to divide the elements of an array into 3 groups. This needs to be done without sorting the array. Consider the example

we have 120 unsorted values thus the smallest 40 values need to be in the first group and next 40 in the second and the largest 40 in the third group

I was thinking of the median of median approach but not able to apply it to my problem, kindly suggest an algorithm.

like image 620
nishant mehta Avatar asked Oct 11 '13 20:10

nishant mehta


2 Answers

You can call quickselect twice on your array to do this in-place and in average case linear time. The worst case runtime can also be improved to O(n) by using the linear time median of medians algorithm to choose an optimal pivot for quickselect.

For both calls to quickselect, use k = n / 3. On your first call, use quickselect on the entire array, and on your second call, use it from arr[k..n-1] (for a 0-indexed array).

Wikipedia explanation of quickselect:

Quickselect uses the same overall approach as quicksort, choosing one element as a pivot and partitioning the data in two based on the pivot, accordingly as less than or greater than the pivot. However, instead of recursing into both sides, as in quicksort, quickselect only recurses into one side – the side with the element it is searching for. This reduces the average complexity from O(n log n) (in quicksort) to O(n) (in quickselect).

As with quicksort, quickselect is generally implemented as an in-place algorithm, and beyond selecting the kth element, it also partially sorts the data. See selection algorithm for further discussion of the connection with sorting.

To divide the elements of the array into 3 groups, use the following algorithm written in Python in combination with quickselect:

k = n / 3

# First group smallest elements in array
quickselect(L, 0, n - 1, k) # Call quickselect on your entire array

# Then group middle elements in array
quickselect(L, k, n - 1, k) # Call quickselect on subarray

# Largest elements in array are already grouped so
# there is no need to call quickselect again

The key point of all this is that quickselect uses a subroutine called partition. Partition arranges an array into two parts, those greater than a given element and those less than a given element. Thus it partially sorts an array around this element and returns its new sorted position. Thus by using quickselect, you actually partially sort the array around the kth element (note that this is different from actually sorting the entire array) in-place and in average-case linear time.

Time Complexity for quickselect:

  1. Worst case performance O(n2)
  2. Best case performance O(n)
  3. Average case performance O(n)

The runtime of quickselect is almost always linear and not quadratic, but this depends on the fact that for most arrays, simply choosing a random pivot point will almost always yield linear runtime. However, if you want to improve the worst case performance for your quickselect, you can choose to use the median of medians algorithm before each call to approximate an optimal pivot to be used for quickselect. In doing so, you will improve the worst case performance of your quickselect algorithm to O(n). This overhead probably isn't necessary but if you are dealing with large lists of randomized integers it can prevent some abnormal quadratic runtimes of your algorithm.

Here is a complete example in Python which implements quickselect and applies it twice to a reverse-sorted list of 120 integers and prints out the three resulting sublists.

from random import randint


def partition(L, left, right, pivotIndex):
    '''partition L so it's ordered around L[pivotIndex]
       also return its new sorted position in array'''
    pivotValue = L[pivotIndex]
    L[pivotIndex], L[right] = L[right], L[pivotIndex]
    storeIndex = left
    for i in xrange(left, right):
        if L[i] < pivotValue:
            L[storeIndex], L[i] = L[i], L[storeIndex]
            storeIndex = storeIndex + 1
    L[right], L[storeIndex] = L[storeIndex], L[right]
    return storeIndex


def quickselect(L, left, right, k):
    '''retrieve kth smallest element of L[left..right] inclusive
       additionally partition L so that it's ordered around kth 
       smallest element'''
    if left == right:
        return L[left]
    # Randomly choose pivot and partition around it
    pivotIndex = randint(left, right)
    pivotNewIndex = partition(L, left, right, pivotIndex)
    pivotDist = pivotNewIndex - left + 1
    if pivotDist == k:
        return L[pivotNewIndex]
    elif k < pivotDist:
        return quickselect(L, left, pivotNewIndex - 1, k)
    else:
        return quickselect(L, pivotNewIndex + 1, right, k - pivotDist)


def main():
    # Setup array of 120 elements [120..1]
    n = 120
    L = range(n, 0, -1)

    k = n / 3

    # First group smallest elements in array
    quickselect(L, 0, n - 1, k) # Call quickselect on your entire array

    # Then group middle elements in array
    quickselect(L, k, n - 1, k) # Call quickselect on subarray

    # Largest elements in array are already grouped so 
    # there is no need to call quickselect again

    print L[:k], '\n'
    print L[k:k*2], '\n'
    print L[k*2:]


if __name__ == '__main__':
    main()
like image 64
17 revs Avatar answered Sep 30 '22 17:09

17 revs


I would take a look at order statistics. The kth order statistic of a statistical sample is equal to its kth-smallest value. The problem of computing the kth smallest (or largest) element of a list is called the selection problem and is solved by a selection algorithm.

It is right to think the median of the medians way. However, instead of finding the median, you might want to find both 20th and 40th smallest elements from the array. Just like finding the median, it takes only linear time to find both of them using a selection algorithm. Finally you go over the array and partition the elements according to these two elements, which is linear time as well.

PS. If this is your exercise in an algorithm class, this might help you :)

like image 39
Terry Li Avatar answered Sep 30 '22 18:09

Terry Li