Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Re-sort a vector after a small number of elements have been modified

If we have a vector of size N that was previously sorted, and replace up to M elements with arbitrary values (where M is much smaller than N), is there an easy way to re-sort them at lower cost (i.e. generate a sorting network of reduced depth) than a full sort?

For example if N=10 and M=2 the input might be

10 20 30 40 999 60 70 80 90 -1

Note: the indices of the modified elements are not known (until we compare them with the surrounding elements.)


Here is an example where I know the solution because the input size is small and I was able to find it with a brute-force search:

if N = 5 and M is 1, these would be valid inputs:

0 0 0 0 0     0 0 1 0 0     0 1 0 0 0     0 1 1 1 0     1 0 0 1 1     1 1 1 1 0

0 0 0 0 1     0 0 1 0 1     0 1 0 0 1     0 1 1 1 1     1 0 1 1 1     1 1 1 1 1

0 0 0 1 0     0 0 1 1 0     0 1 0 1 1     1 0 0 0 0     1 1 0 1 1

0 0 0 1 1     0 0 1 1 1     0 1 1 0 1     1 0 0 0 1     1 1 1 0 1

For example the input may be 0 1 1 0 1 if the previously sorted vector was 0 1 1 1 1 and the 4th element was modified, but there is no way to form 0 1 0 1 0 as a valid input, because it differs in at least 2 elements from any sorted vector.

This would be a valid sorting network for re-sorting these inputs:

>--*---*-----*-------->
   |   |     | 
>--*---|-----|-*---*-->
       |     | |   |
>--*---|-*---*-|---*-->
   |   | |     |
>--*---*-|-----*---*-->
         |         |
>--------*---------*-->

We do not care that this network fails to sort some invalid inputs (e.g. 0 1 0 1 0.)

And this network has depth 4, a saving of 1 compared with the general case (a depth of 5 generally necessary to sort a 5-element vector.)

Unfortunately the brute-force approach is not feasible for larger input sizes.

Is there a known method for constructing a network to re-sort a larger vector?

My N values will be in the order of a few hundred, with M not much more than √N.

like image 626
finnw Avatar asked Sep 15 '14 19:09

finnw


1 Answers

Ok, I'm posting this as an answer since the comment restriction on length drives me nuts :)

You should try this out:

  • implement a simple sequential sort working on local memory (insertion sort or sth. similar). If you don't know how - I can help with that.
  • have only a single work-item perform the sorting on the chunk of N elements
  • calculate the maximum size of local memory per work-group (call clGetDeviceInfo with CL_DEVICE_LOCAL_MEM_SIZE) and derive the maximum number of work-items per work-group, because with this approach your number of work-items will most likely be limited by the amount of local memory.

This will probably work rather well I suspect, because:

  • a simple sort may be perfectly fine, especially since the array is already sorted to a large degree
  • parallelizing for such a small number of items is not worth the trouble (using local memory however is!)
  • since you're processing billions of such small arrays, you will achieve a great occupancy even if only single work-items process such arrays

Let me know if you have problems with my ideas.

EDIT 1:

I just realized I used a technique that may be confusing to others: My proposal for using local memory is not for synchronization or using multiple work items for a single input vector/array. I simply use it to get a low read/write memory latency. Since we use rather large chunks of memory I fear that using private memory may cause swapping to slow global memory without us realizing it. This also means you have to allocate local memory for each work-item. Each work-item will access its own chunk of local memory and use it for sorting (exclusively). I'm not sure how good this idea is, but I've read that using too much private memory may cause swapping to global memory and the only way to notice is by looking at the performance (not sure if I'm right about this).

like image 195
Baiz Avatar answered Sep 22 '22 03:09

Baiz