Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently find order statistics of unsorted list prefixes?

A is an array of the integers from 1 to n in random order.

I need random access to the ith largest element of the first j elements in at least log time.

What I've come up with so far is an n x n matrix M, where the element in the (i, j) position is the ith largest of the first j. This gives me constant-time random access, but requires n^2 storage.

By construction, M is sorted by row and column. Further, each column differs from its neighbors by a single value.

Can anyone suggest a way to compress M down to n log(n) space or better, with log(n) or better random access time?

like image 644
Dave Avatar asked Mar 12 '13 01:03

Dave


2 Answers

I believe you can perform the access in O(log(N)) time, given O(N log(N)) preprocessing time and O(N log(N)) extra space. Here's how.

You can augment a red-black tree to support a select(i) operation which retrieves the element at rank i in O(log(N)) time. For example, see this PDF or the appropriate chapter of Introduction to Algorithms.

You can implement a red-black tree (even one augmented to support select(i)) in a functional manner, such that the insert operation returns a new tree which shares all but O(log(N)) nodes with the old tree. See for example Purely Functional Data Structures by Chris Okasaki.

We will build an array T of purely functional augmented red-black trees, such that the tree T[j] stores the indexes 0 ... j-1 of the first j elements of A sorted largest to smallest.

Base case: At T[0] create an augmented red-black tree with just one node, whose data is the number 0, which is the index of the 0th largest element in the first 1 elements of your array A.

Inductive step: For each j from 1 to N-1, at T[j] create an augmented red-black tree by purely functionally inserting a new node with index j into the tree T[j-1]. This creates at most O(log(j)) new nodes; the remaining nodes are shared with T[j-1]. This takes O(log(j)) time.

The total time to construct the array T is O(N log(N)) and the total space used is also O(N log(N)).

Once T[j-1] is created, you can access the ith largest element of the first j elements of A by performing T[j-1].select(i). This takes O(log(j)) time. Note that you can create T[j-1] lazily the first time it is needed. If A is very large and j is always relatively small, this will save a lot of time and space.

like image 125
rob mayoff Avatar answered Nov 13 '22 08:11

rob mayoff


Unless I misunderstand, you are just finding the k-th order statistic of an array which is the prefix of another array.

This can be done using an algorithm that I think is called 'quickselect' or something along those lines. Basically, it's like quicksort:

  1. Take a random pivot
  2. Swap around array elements so all the smaller ones are on one side
  3. You know this is the p+1th largest element where p is the number of smaller array elements
  4. If p+1 = k, it's the solution! If p+1 > k, repeat on the 'smaller' subarray. If p+1 < k, repeat on the larger 'subarray'.

There's a (much) better description here under the Quickselect and Quicker Select headings, and also just generally on the internet if you search for k-th order quicksort solutions.

Although the worst-case time for this algorithm is O(n2) like quicksort, its expected case is much better (also like quicksort) if you properly select your random pivots. I think the space complexity would just be O(n); you can just make one copy of your prefix to muck up the ordering for.

like image 29
chm Avatar answered Nov 13 '22 08:11

chm