Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can an array be grouped more efficiently than sorted?

While working on example code for an algorithm question, I came across the situation where I was sorting an input array, even though I only needed to have identical elements grouped together, but not in any particular order, e.g.:

{1,2,4,1,4,3,2} → {1,1,2,2,4,4,3} or {1,1,2,2,3,4,4} or {3,1,1,2,2,4,4} or ...

Which made me wonder: is it possible to group identical elements in an array together more efficiently than by sorting the array?

On the one hand, the fact that elements don't need to be moved to a particular location means more freedom to find an order which requires fewer swaps. On the other hand, keeping track of where every element in a group is located, and what the optimal final location is, may need more calculations than simply sorting the array.

A logical candidate would be a type of counting sort, but what if the array length and/or value range were impractically large?

For the sake of argument, let's say the array is large (e.g. a million elements), contains 32-bit integers, and the number of identical elements per value could be anything from 1 to a million.


UPDATE: For languages with support for dictionaries, Salvador Dali's answer is obviously the way to go. I'd still be interested in hearing about old-fashioned compare-and-swap methods, or methods which use less space, if there are any.

like image 636

2 Answers

Since you asked about comparison-based methods, I'm going to make the usual assumptions that (1) elements can be compared but not hashed (2) the only resource of interest is three-way operations.

In an absolute sense, it's easier to group than to sort. Here's a grouping algorithm for three elements that uses one comparison (sorting requires three). Given an input x, y, z, if x = y, then return x, y, z. Otherwise, return x, z, y.

Asymptotically, however, both grouping and sorting require Omega(n log n) comparisons. The lower bound technique is information-theoretic: we prove that, for every grouping algorithm expressed as a decision tree, there are 3^Omega(n log n) leaves, which implies that the height of the tree (and hence the worst-case running time of the algorithm) is Omega(n log n).

Fix an arbitrary leaf of the decision tree where no input elements are found to be equal. The input positions are partially ordered by the inequalities found.

Suppose to the contrary that i, j, k are pairwise incomparable input positions. Letting x = input[i], y = input[j], z = input[k], the possibilities x = y < z and y = z < x and z = x < y are all consistent with what the algorithm has observed. This cannot be, since it is impossible for the one order chosen by the leaf to put x next to y next to z next to x. We conclude that the partial order has no antichain of cardinality three.

By Dilworth's theorem, the partial order has two chains that cover the whole input. By considering all possible ways to merge these chains into a total order, there are at most n choose m ≤ 2^n permutations that map to each leaf. The number of leaves is thus at least n!/2^n = 3^Omega(n log n).

like image 76
David Eisenstat Avatar answered Oct 18 '22 16:10

David Eisenstat


Yes, all you need to do is to create a dictionary and count how many elements of each time you have. After that just iterate over keys in that dictionary and output this key the same number of time as the value of that key.

Quick python implementation:

from collections import Counter
arr = [1,2,4,1,4,3,2]
cnt, grouped = Counter(arr), []  # counter create a dictionary which counts the number of each element
for k, v in cnt.iteritems():
    grouped += [k] * v # [k] * v create an array of length v, which has all elements equal to k

print grouped

This will group all the elements in O(n) time using potentially O(n) additional space. Which is more efficiently (in terms of time complexity) than a sorting which will achieve this in O(n logn) time and can be done inplace.

like image 45
Salvador Dali Avatar answered Oct 18 '22 17:10

Salvador Dali