I have an application where the user should be able to modify an image with sliders for hue, saturation and lightness. All image processing is done on the GPU using GLSL fragment shaders.
My problem is that RGB -> HSL -> RGB conversions are rather expensive on the gpu due to the extensive branching.
My question is whether I can convert the users "color adjustments" to some other color space which can more efficiently compute the adjusted image on the GPU.
It's a mistake to assume that branching in the GPU and branching in code are the same thing.
For simple conditionals there's never any branching at all. GPUs have conditional move instructions that directly translate to ternary expressions and simple if-else statements.
Where things get problematic is when you have nested conditionals or multiple conditionally-dependent operations. Then you have to consider whether the GLSL compiler is smart enough to translate it all into cmoves. Whenever possible the compiler will emit code that executes all branches and recombine the result with conditional moves, but it can't always do that.
You've got to know when to help it. Never guess when you can measure - use AMD's GPU Shader Analyzer or Nvidia's GCG to view the assembly output. The instruction set of a GPU is very limited and simplistic so don't be scared of the word 'assembly.'
Here's a pair of RGB/HSL conversion functions which I've changed around so they play nicely with AMD's GLSL compiler, along with the assembly output. Credit goes to Paul Bourke for the original C conversion code.
// HSL range 0:1
vec4 convertRGBtoHSL( vec4 col )
{
float red = col.r;
float green = col.g;
float blue = col.b;
float minc = min3( col.r, col.g, col.b );
float maxc = max3( col.r, col.g, col.b );
float delta = maxc - minc;
float lum = (minc + maxc) * 0.5;
float sat = 0.0;
float hue = 0.0;
if (lum > 0.0 && lum < 1.0) {
float mul = (lum < 0.5) ? (lum) : (1.0-lum);
sat = delta / (mul * 2.0);
}
vec3 masks = vec3(
(maxc == red && maxc != green) ? 1.0 : 0.0,
(maxc == green && maxc != blue) ? 1.0 : 0.0,
(maxc == blue && maxc != red) ? 1.0 : 0.0
);
vec3 adds = vec3(
((green - blue ) / delta),
2.0 + ((blue - red ) / delta),
4.0 + ((red - green) / delta)
);
float deltaGtz = (delta > 0.0) ? 1.0 : 0.0;
hue += dot( adds, masks );
hue *= deltaGtz;
hue /= 6.0;
if (hue < 0.0)
hue += 1.0;
return vec4( hue, sat, lum, col.a );
}
Assembly output for this function:
1 x: MIN ____, R0.y, R0.z
y: ADD R127.y, -R0.x, R0.z
z: MAX ____, R0.y, R0.z
w: ADD R127.w, R0.x, -R0.y
t: ADD R127.x, R0.y, -R0.z
2 y: MAX R126.y, R0.x, PV1.z
w: MIN R126.w, R0.x, PV1.x
t: MOV R1.w, R0.w
3 x: ADD R125.x, -PV2.w, PV2.y
y: SETE_DX10 ____, R0.x, PV2.y
z: SETNE_DX10 ____, R0.y, PV2.y
w: SETE_DX10 ____, R0.y, PV2.y
t: SETNE_DX10 ____, R0.z, PV2.y
4 x: CNDE_INT R123.x, PV3.y, 0.0f, PV3.z
y: CNDE_INT R125.y, PV3.w, 0.0f, PS3
z: SETNE_DX10 ____, R0.x, R126.y
w: SETE_DX10 ____, R0.z, R126.y
t: RCP_e R125.w, PV3.x
5 x: MUL_e ____, PS4, R127.y
y: CNDE_INT R123.y, PV4.w, 0.0f, PV4.z
z: ADD/2 R127.z, R126.w, R126.y VEC_021
w: MUL_e ____, PS4, R127.w
t: CNDE_INT R126.x, PV4.x, 0.0f, 1065353216
6 x: MUL_e ____, R127.x, R125.w
y: CNDE_INT R123.y, R125.y, 0.0f, 1065353216
z: CNDE_INT R123.z, PV5.y, 0.0f, 1065353216
w: ADD ____, PV5.x, (0x40000000, 2.0f).y
t: ADD ____, PV5.w, (0x40800000, 4.0f).z
7 x: DOT4 ____, R126.x, PV6.x
y: DOT4 ____, PV6.y, PV6.w
z: DOT4 ____, PV6.z, PS6
w: DOT4 ____, (0x80000000, -0.0f).x, 0.0f
t: SETGT_DX10 R125.w, 0.5, R127.z
8 x: ADD R126.x, PV7.x, 0.0f
y: SETGT_DX10 ____, R127.z, 0.0f
z: ADD ____, -R127.z, 1.0f
w: SETGT_DX10 ____, R125.x, 0.0f
t: SETGT_DX10 ____, 1.0f, R127.z
9 x: CNDE_INT R127.x, PV8.y, 0.0f, PS8
y: CNDE_INT R123.y, R125.w, PV8.z, R127.z
z: CNDE_INT R123.z, PV8.w, 0.0f, 1065353216
t: MOV R1.z, R127.z
10 x: MOV*2 ____, PV9.y
w: MUL ____, PV9.z, R126.x
11 z: MUL_e R127.z, PV10.w, (0x3E2AAAAB, 0.1666666716f).x
t: RCP_e ____, PV10.x
12 x: ADD ____, PV11.z, 1.0f
y: SETGT_DX10 ____, 0.0f, PV11.z
z: MUL_e ____, R125.x, PS11
13 x: CNDE_INT R1.x, PV12.y, R127.z, PV12.x
y: CNDE_INT R1.y, R127.x, 0.0f, PV12.z
Notice that there are no branching instructions. It's conditional moves all the way, pretty much exactly as I wrote them.
The hardware needed for a conditional move is just a binary comparator (5 gates per bit) and a bunch of traces. Very fast.
Another fun thing to notice is that there's no divides. Instead the compiler used an approximate reciprocal and a multiply instruction. It does this for sqrt operations as well a lot of the time. You can pull the same tricks on a CPU with (for example) the SSE rcpps and rsqrtps instructions.
Now the reverse operation:
// HSL [0:1] to RGB [0:1]
vec4 convertHSLtoRGB( vec4 col )
{
const float onethird = 1.0 / 3.0;
const float twothird = 2.0 / 3.0;
const float rcpsixth = 6.0;
float hue = col.x;
float sat = col.y;
float lum = col.z;
vec3 xt = vec3(
rcpsixth * (hue - twothird),
0.0,
rcpsixth * (1.0 - hue)
);
if (hue < twothird) {
xt.r = 0.0;
xt.g = rcpsixth * (twothird - hue);
xt.b = rcpsixth * (hue - onethird);
}
if (hue < onethird) {
xt.r = rcpsixth * (onethird - hue);
xt.g = rcpsixth * hue;
xt.b = 0.0;
}
xt = min( xt, 1.0 );
float sat2 = 2.0 * sat;
float satinv = 1.0 - sat;
float luminv = 1.0 - lum;
float lum2m1 = (2.0 * lum) - 1.0;
vec3 ct = (sat2 * xt) + satinv;
vec3 rgb;
if (lum >= 0.5)
rgb = (luminv * ct) + lum2m1;
else rgb = lum * ct;
return vec4( rgb, col.a );
}
(edited 05/July/2013: I made a mistake when translating this function orignally. The assembly has also been updated).
Assembly output:
1 x: ADD ____, -R2.x, 1.0f
y: ADD ____, R2.x, (0xBF2AAAAB, -0.6666666865f).x
z: ADD R0.z, -R2.x, (0x3F2AAAAB, 0.6666666865f).y
w: ADD R0.w, R2.x, (0xBEAAAAAB, -0.3333333433f).z
2 x: SETGT_DX10 R0.x, (0x3F2AAAAB, 0.6666666865f).x, R2.x
y: MUL R0.y, PV2.x, (0x40C00000, 6.0f).y
z: MOV R1.z, 0.0f
w: MUL R1.w, PV2.y, (0x40C00000, 6.0f).y
3 x: MUL ____, R0.w, (0x40C00000, 6.0f).x
y: MUL ____, R0.z, (0x40C00000, 6.0f).x
z: ADD R0.z, -R2.x, (0x3EAAAAAB, 0.3333333433f).y
w: MOV ____, 0.0f
4 x: CNDE_INT R0.x, R0.x, R0.y, PV4.x
y: CNDE_INT R0.y, R0.x, R1.z, PV4.y
z: CNDE_INT R1.z, R0.x, R1.w, PV4.w
w: SETGT_DX10 R1.w, (0x3EAAAAAB, 0.3333333433f).x, R2.x
5 x: MUL ____, R2.x, (0x40C00000, 6.0f).x
y: MUL ____, R0.z, (0x40C00000, 6.0f).x
z: ADD R0.z, -R2.y, 1.0f
w: MOV ____, 0.0f
6 x: CNDE_INT R127.x, R1.w, R0.x, PV6.w
y: CNDE_INT R127.y, R1.w, R0.y, PV6.x
z: CNDE_INT R127.z, R1.w, R1.z, PV6.y
w: ADD R1.w, -R2.z, 1.0f
7 x: MULADD R0.x, R2.z, (0x40000000, 2.0f).x, -1.0f
y: MIN*2 ____, PV7.x, 1.0f
z: MIN*2 ____, PV7.y, 1.0f
w: MIN*2 ____, PV7.z, 1.0f
8 x: MULADD R1.x, PV8.z, R2.y, R0.z
y: MULADD R127.y, PV8.w, R2.y, R0.z
z: SETGE_DX10 R1.z, R2.z, 0.5
w: MULADD R0.w, PV8.y, R2.y, R0.z
9 x: MULADD R0.x, R1.w, PV9.x, R0.x
y: MULADD R0.y, R1.w, PV9.y, R0.x
z: MUL R0.z, R2.z, PV9.y
w: MULADD R1.w, R1.w, PV9.w, R0.x
10 x: MUL ____, R2.z, R0.w
y: MUL ____, R2.z, R1.x
w: MOV R2.w, R2.w
11 x: CNDE_INT R2.x, R1.z, R0.z, R0.y
y: CNDE_INT R2.y, R1.z, PV11.y, R0.x
z: CNDE_INT R2.z, R1.z, PV11.x, R1.w
Again no branches. Yum!
For lightness and saturation you can use YUV (actually YCbCr). It's easy to convert from RGB and back. No branching needed. Saturation is controlled by increasing or decreasing both Cr and Cb. Lightness is Y.
You get something similar to HSL hue modification by rotating Cb and Cr components (it's practically a 3D vector), but of course it depends on your application if that's enough.
Edit: A color component (Cb,Cr) is a point in a color plane like above. If you take any random point and rotate it around the center, result is hue changing. But as mechanism is a bit different than in HSL, results are not precisely same.
Image is public domain from Wikipedia.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With