I am writing a fragment shader in order to median 9 images together.
I have never worked with GLSL before, but it seemed like the right tool for the job, as OpenCL isn't available on iOS and medianing on the CPU is inefficient. Here's what I have so far:
uniform sampler2D frames[9];
uniform vec2 wh;
void main(void)
{
vec4 sortedFrameValues[9];
float sortedGrayScaleValues[9];
for (int i = 0; i < 9; i++)
{
sortedFrameValues[i] = texture2D(frames[i], -gl_FragCoord.xy / wh);
sortedGrayScaleValues[i] = dot(sortedFrameValues[i].xyz, vec3(0.299, 0.587, 0.114));
}
// TODO: Sort sortedGrayScaleValues
float gray = sortedGrayScaleValues[4];
gl_FragColor = vec4(gray, gray, gray, 0);
}
A little late, but the fastest way I've found is insertion sort. Reducing shader complexity and divergence is key. Bitonic and bubble work pretty well too for small numbers. Once you get up around 100, switch to merge sort.
Since you know the number of things to sort (9) your best bet is a sort network. You could use this handy tool to generate it...
There are 27 comparators in this network,
grouped into 11 parallel operations.
[[0,1],[2,3],[4,5],[7,8]]
[[0,2],[1,3],[6,8]]
[[1,2],[6,7],[5,8]]
[[4,7],[3,8]]
[[4,6],[5,7]]
[[5,6],[2,7]]
[[0,5],[1,6],[3,7]]
[[0,4],[1,5],[3,6]]
[[1,4],[2,5]]
[[2,4],[3,5]]
[[3,4]]
A handy way to use this is declare a compare-and-swap macro...
#define CMP(a, b) ...
#define SWAP(a, b) ...
#define CSWAP(a, b) if (CMP(a, b)) {SWAP(a, b);}
CSWAP(0, 1); CSWAP(2, 3); ...
Combining both approaches, a sort network to quickly sort small blocks of data and then merge sort if you have many blocks works very well, as described in Fast Sorting for Exact OIT of Complex Scenes (disclaimer: I'm an author).
Unrolling loops (essentially creating a sort network) can be particularly beneficial, allowing sorting in registers. Dynamically indexed arrays are placed in local memory which is slow. To force the compiler not to do this, you could manually declare vec4 array0, array1 .... Macros can concatenate text which is useful here #define CMP(a, b) (array##a < array##b). A rather ugly but fast example is here.
Well, I ended up implementing a bubble sort and using the middle value.
This is what my solution looks like:
uniform sampler2D frames[9];
uniform vec2 wh;
vec4 frameValues[9];
float arr[9];
void bubbleSort()
{
bool swapped = true;
int j = 0;
float tmp;
for (int c = 0; c < 3; c--)
{
if (!swapped)
break;
swapped = false;
j++;
for (int i = 0; i < 3; i++)
{
if (i >= 3 - j)
break;
if (arr[i] > arr[i + 1])
{
tmp = arr[i];
arr[i] = arr[i + 1];
arr[i + 1] = tmp;
swapped = true;
}
}
}
}
void main(void)
{
for (int i = 0; i < 9; i++)
{
frameValues[i] = texture2D(frames[i], -gl_FragCoord.xy / wh);
arr[i] = dot(frameValues[i].xyz, vec3(0.299, 0.587, 0.114));
}
bubbleSort();
float gray = arr[4];
gl_FragColor =vec4(gray, gray, gray, 0);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With