Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linear sort on GPU. Does parallel processing change Big-O?

If a GPU is really able to calculate code in parallel. This sorting algorithm must be correct.

  1. Create a 2D comparison matrix

O(n)

values = [ 3, 1, 2 ]

                     # 3  1  2
comparisonMatrix = [ [ 0, 1, 1 ], # 3
                     [ 0, 0, 0 ], # 1
                     [ 0, 1, 0 ]] # 2

# Done on GPU
comparisonMatrix[rowIdx][columnIdx] = values[rowIdx] > values[columnIdx]
  1. Calculate row sums

O(n)

rowSums = [[ 1 ], # 3
           [ 0 ], # 1
           [ 2 ]] # 2

# Done on GPU
rowSums[rowIds] = comparisonMatrix[rowsIds][all]
  1. Mapping the initial values to a sortedArray using the rowSums array as an index

O(1)

sortedValues = [ 1, 2, 3 ]

# Done on GPU
sortedValues[rowIdx] = values[rowSums[rowIdx]]

Total: O(n + n + 1) = O(n)

Counter argument:

GPU's have a finite number of cores so the big-O of iterating over an array is O(n/NUM_CORES) instead of O(1). But since hardware should not be included in mathmatics we should either assume NUM_CORES is 1 or infinate. Being infinate would cause this algorithm to be OK, while assuming 1 would cause GPU's to have no mathmatical impact on complexity.


Notes:

This is not a reasonable algorithm to run since memory is O(n^2) it is more of a proof.

Values are all different from eachother, otherwise this would cause two rowSums to be equivalent.

Although there are ways of doing these substeps faster, I stuck with the most strait forward ones.

like image 546
Ben Lirio Avatar asked Jun 28 '26 12:06

Ben Lirio


1 Answers

The answer depends on whether you regard the number of processor's as a relevant parameter of your complexity analysis. If yes, then you have to introduce an extra parameter, say p, for the number of processors.

If your algorithm is scalable, this means that the time complexity scales inversely linearly with the number of processors, so instead of O(n) you would get O(n/p) in the ideal case. But that's really the ideal case, and it's called perfect linear speedup. (See here for more details.)

But it's definitely wrong to say that an O(n^2) algorithm would run in O(n) on a parallel machine since it's unreasonable to assume that the number of processors automatically grows with the size of the input.

If you regard the number of processors as a constant, then nothing changes.

like image 78
Mo B. Avatar answered Jun 30 '26 15:06

Mo B.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!