Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HSL Image Adjustements on GPU

I have an application where the user should be able to modify an image with sliders for hue, saturation and lightness. All image processing is done on the GPU using GLSL fragment shaders.

My problem is that RGB -> HSL -> RGB conversions are rather expensive on the gpu due to the extensive branching.

My question is whether I can convert the users "color adjustments" to some other color space which can more efficiently compute the adjusted image on the GPU.

like image 236
ronag Avatar asked Jan 18 '11 20:01

ronag


2 Answers

It's a mistake to assume that branching in the GPU and branching in code are the same thing.

For simple conditionals there's never any branching at all. GPUs have conditional move instructions that directly translate to ternary expressions and simple if-else statements.

Where things get problematic is when you have nested conditionals or multiple conditionally-dependent operations. Then you have to consider whether the GLSL compiler is smart enough to translate it all into cmoves. Whenever possible the compiler will emit code that executes all branches and recombine the result with conditional moves, but it can't always do that.

You've got to know when to help it. Never guess when you can measure - use AMD's GPU Shader Analyzer or Nvidia's GCG to view the assembly output. The instruction set of a GPU is very limited and simplistic so don't be scared of the word 'assembly.'

Here's a pair of RGB/HSL conversion functions which I've changed around so they play nicely with AMD's GLSL compiler, along with the assembly output. Credit goes to Paul Bourke for the original C conversion code.

// HSL range 0:1
vec4 convertRGBtoHSL( vec4 col )
{
    float red   = col.r;
    float green = col.g;
    float blue  = col.b;

    float minc  = min3( col.r, col.g, col.b );
    float maxc  = max3( col.r, col.g, col.b );
    float delta = maxc - minc;

    float lum = (minc + maxc) * 0.5;
    float sat = 0.0;
    float hue = 0.0;

    if (lum > 0.0 && lum < 1.0) {
        float mul = (lum < 0.5)  ?  (lum)  :  (1.0-lum);
        sat = delta / (mul * 2.0);
    }

    vec3 masks = vec3(
        (maxc == red   && maxc != green) ? 1.0 : 0.0,
        (maxc == green && maxc != blue)  ? 1.0 : 0.0,
        (maxc == blue  && maxc != red)   ? 1.0 : 0.0
    );

    vec3 adds = vec3(
              ((green - blue ) / delta),
        2.0 + ((blue  - red  ) / delta),
        4.0 + ((red   - green) / delta)
    );

    float deltaGtz = (delta > 0.0) ? 1.0 : 0.0;

    hue += dot( adds, masks );
    hue *= deltaGtz;
    hue /= 6.0;

    if (hue < 0.0)
        hue += 1.0;

    return vec4( hue, sat, lum, col.a );
}

Assembly output for this function:

 1  x: MIN         ____,    R0.y,   R0.z      
    y: ADD         R127.y, -R0.x,   R0.z      
    z: MAX         ____,    R0.y,   R0.z      
    w: ADD         R127.w,  R0.x,  -R0.y      
    t: ADD         R127.x,  R0.y,  -R0.z      
 2  y: MAX         R126.y,  R0.x,   PV1.z      
    w: MIN         R126.w,  R0.x,   PV1.x      
    t: MOV         R1.w,    R0.w      
 3  x: ADD         R125.x, -PV2.w,  PV2.y      
    y: SETE_DX10   ____,    R0.x,   PV2.y      
    z: SETNE_DX10  ____,    R0.y,   PV2.y      
    w: SETE_DX10   ____,    R0.y,   PV2.y      
    t: SETNE_DX10  ____,    R0.z,   PV2.y      
 4  x: CNDE_INT    R123.x,  PV3.y,  0.0f,   PV3.z      
    y: CNDE_INT    R125.y,  PV3.w,  0.0f,   PS3      
    z: SETNE_DX10  ____,    R0.x,   R126.y      
    w: SETE_DX10   ____,    R0.z,   R126.y      
    t: RCP_e       R125.w,  PV3.x      
 5  x: MUL_e       ____,    PS4,     R127.y      
    y: CNDE_INT    R123.y,  PV4.w,   0.0f,  PV4.z      
    z: ADD/2       R127.z,  R126.w,  R126.y      VEC_021 
    w: MUL_e       ____,    PS4,     R127.w      
    t: CNDE_INT    R126.x,  PV4.x,   0.0f,  1065353216      
 6  x: MUL_e       ____,    R127.x,  R125.w      
    y: CNDE_INT    R123.y,  R125.y,  0.0f,  1065353216      
    z: CNDE_INT    R123.z,  PV5.y,   0.0f,  1065353216      
    w: ADD         ____,    PV5.x,   (0x40000000, 2.0f).y      
    t: ADD         ____,    PV5.w,   (0x40800000, 4.0f).z      
 7  x: DOT4        ____,    R126.x,  PV6.x      
    y: DOT4        ____,    PV6.y,   PV6.w      
    z: DOT4        ____,    PV6.z,   PS6      
    w: DOT4        ____,    (0x80000000, -0.0f).x,  0.0f      
    t: SETGT_DX10  R125.w,  0.5,     R127.z      
 8  x: ADD         R126.x,  PV7.x,   0.0f      
    y: SETGT_DX10  ____,    R127.z,  0.0f      
    z: ADD         ____,   -R127.z,  1.0f      
    w: SETGT_DX10  ____,    R125.x,  0.0f      
    t: SETGT_DX10  ____,    1.0f,    R127.z      
 9  x: CNDE_INT    R127.x,  PV8.y,   0.0f,   PS8      
    y: CNDE_INT    R123.y,  R125.w,  PV8.z,  R127.z      
    z: CNDE_INT    R123.z,  PV8.w,   0.0f,   1065353216      
    t: MOV         R1.z,    R127.z      
10  x: MOV*2       ____,    PV9.y      
    w: MUL         ____,    PV9.z,   R126.x      
11  z: MUL_e       R127.z,  PV10.w,  (0x3E2AAAAB, 0.1666666716f).x      
    t: RCP_e       ____,    PV10.x      
12  x: ADD         ____,    PV11.z,  1.0f      
    y: SETGT_DX10  ____,    0.0f,    PV11.z      
    z: MUL_e       ____,    R125.x,  PS11      
13  x: CNDE_INT    R1.x,    PV12.y,  R127.z,  PV12.x      
    y: CNDE_INT    R1.y,    R127.x,  0.0f,    PV12.z  

Notice that there are no branching instructions. It's conditional moves all the way, pretty much exactly as I wrote them.

The hardware needed for a conditional move is just a binary comparator (5 gates per bit) and a bunch of traces. Very fast.

Another fun thing to notice is that there's no divides. Instead the compiler used an approximate reciprocal and a multiply instruction. It does this for sqrt operations as well a lot of the time. You can pull the same tricks on a CPU with (for example) the SSE rcpps and rsqrtps instructions.

Now the reverse operation:

// HSL [0:1] to RGB [0:1]
vec4 convertHSLtoRGB( vec4 col )
{
    const float onethird = 1.0 / 3.0;
    const float twothird = 2.0 / 3.0;
    const float rcpsixth = 6.0;

    float hue = col.x;
    float sat = col.y;
    float lum = col.z;

    vec3 xt = vec3(
        rcpsixth * (hue - twothird),
        0.0,
        rcpsixth * (1.0 - hue)
    );

    if (hue < twothird) {
        xt.r = 0.0;
        xt.g = rcpsixth * (twothird - hue);
        xt.b = rcpsixth * (hue      - onethird);
    } 

    if (hue < onethird) {
        xt.r = rcpsixth * (onethird - hue);
        xt.g = rcpsixth * hue;
        xt.b = 0.0;
    }

    xt = min( xt, 1.0 );

    float sat2   =  2.0 * sat;
    float satinv =  1.0 - sat;
    float luminv =  1.0 - lum;
    float lum2m1 = (2.0 * lum) - 1.0;
    vec3  ct     = (sat2 * xt) + satinv;

    vec3 rgb;
    if (lum >= 0.5)
         rgb = (luminv * ct) + lum2m1;
    else rgb =  lum    * ct;

    return vec4( rgb, col.a );
}

(edited 05/July/2013: I made a mistake when translating this function orignally. The assembly has also been updated).

Assembly output:

1   x: ADD         ____,   -R2.x,  1.0f      
    y: ADD         ____,    R2.x,  (0xBF2AAAAB, -0.6666666865f).x      
    z: ADD         R0.z,   -R2.x,  (0x3F2AAAAB, 0.6666666865f).y      
    w: ADD         R0.w,    R2.x,  (0xBEAAAAAB, -0.3333333433f).z      
2   x: SETGT_DX10  R0.x,    (0x3F2AAAAB, 0.6666666865f).x,  R2.x      
    y: MUL         R0.y,    PV2.x,  (0x40C00000, 6.0f).y      
    z: MOV         R1.z,    0.0f      
    w: MUL         R1.w,    PV2.y,  (0x40C00000, 6.0f).y      
3   x: MUL         ____,    R0.w,  (0x40C00000, 6.0f).x      
    y: MUL         ____,    R0.z,  (0x40C00000, 6.0f).x      
    z: ADD         R0.z,   -R2.x,  (0x3EAAAAAB, 0.3333333433f).y      
    w: MOV         ____,    0.0f      
4   x: CNDE_INT    R0.x,    R0.x,   R0.y,  PV4.x      
    y: CNDE_INT    R0.y,    R0.x,   R1.z,  PV4.y      
    z: CNDE_INT    R1.z,    R0.x,   R1.w,  PV4.w      
    w: SETGT_DX10  R1.w,    (0x3EAAAAAB, 0.3333333433f).x,  R2.x      
5   x: MUL         ____,    R2.x,   (0x40C00000, 6.0f).x      
    y: MUL         ____,    R0.z,   (0x40C00000, 6.0f).x      
    z: ADD         R0.z,   -R2.y,   1.0f      
    w: MOV         ____,    0.0f      
6   x: CNDE_INT    R127.x,  R1.w,   R0.x,  PV6.w      
    y: CNDE_INT    R127.y,  R1.w,   R0.y,  PV6.x      
    z: CNDE_INT    R127.z,  R1.w,   R1.z,  PV6.y      
    w: ADD         R1.w,   -R2.z,   1.0f      
7   x: MULADD      R0.x,    R2.z,   (0x40000000, 2.0f).x, -1.0f      
    y: MIN*2       ____,    PV7.x,  1.0f      
    z: MIN*2       ____,    PV7.y,  1.0f      
    w: MIN*2       ____,    PV7.z,  1.0f      
8   x: MULADD      R1.x,    PV8.z,  R2.y,    R0.z      
    y: MULADD      R127.y,  PV8.w,  R2.y,    R0.z      
    z: SETGE_DX10  R1.z,    R2.z,            0.5      
    w: MULADD      R0.w,    PV8.y,  R2.y,    R0.z      
9   x: MULADD      R0.x,    R1.w,   PV9.x,   R0.x      
    y: MULADD      R0.y,    R1.w,   PV9.y,   R0.x      
    z: MUL         R0.z,    R2.z,   PV9.y      
    w: MULADD      R1.w,    R1.w,   PV9.w,   R0.x      
10  x: MUL         ____,    R2.z,   R0.w      
    y: MUL         ____,    R2.z,   R1.x      
    w: MOV         R2.w,    R2.w       
11  x: CNDE_INT    R2.x,    R1.z,   R0.z,    R0.y      
    y: CNDE_INT    R2.y,    R1.z,   PV11.y,  R0.x      
    z: CNDE_INT    R2.z,    R1.z,   PV11.x,  R1.w  

Again no branches. Yum!

like image 94
Spatial Avatar answered Oct 06 '22 01:10

Spatial


For lightness and saturation you can use YUV (actually YCbCr). It's easy to convert from RGB and back. No branching needed. Saturation is controlled by increasing or decreasing both Cr and Cb. Lightness is Y.

You get something similar to HSL hue modification by rotating Cb and Cr components (it's practically a 3D vector), but of course it depends on your application if that's enough.

alt text

Edit: A color component (Cb,Cr) is a point in a color plane like above. If you take any random point and rotate it around the center, result is hue changing. But as mechanism is a bit different than in HSL, results are not precisely same.

Image is public domain from Wikipedia.

like image 39
Virne Avatar answered Oct 06 '22 00:10

Virne