Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimization of HLSL shader

I have the problem that the below pixel shader (HLSL) compiles to 68 instructions (with the below suggested optimizations). However, I would like to use it with shader model 2 and therefore unfortunately I can only use up to 64 instructions. Does anyone see any possible optimizations without changing the result of the shader?

The shader transforms a more-or-less spherical region of the screen (with sinus-shaped borders) from RGB to a gradient of white -> red -> black with some additional brightness etc. modifications.

The shader code is:

// Normalized timefactor (1 = fully enabled)
float timeFactor;

// Center of "light"
float x;
float y;

// Size of "light"
float viewsizeQ;
float fadesizeQ;

// Rotational shift
float angleShift;

// Resolution
float screenResolutionWidth;
float screenResolutionHeight;
float screenZoomQTimesX;

// Texture sampler
sampler TextureSampler : register(s0);

float4 method(float2 texCoord : TEXCOORD0) : COLOR0
{
// New color after transformation
float4 newColor;

// Look up the texture color.
float4 color = tex2D(TextureSampler, texCoord);

// Calculate distance
float2 delta = (float2(x, y) - texCoord.xy)
             * float2(screenResolutionWidth, screenResolutionHeight);

// Get angle from center
float distQ = dot(delta, delta) - sin((atan2(delta.x, delta.y) + angleShift) * 13) * screenZoomQTimesX;

// Within fadeSize
if (distQ < fadesizeQ)
{
   // Make greyscale
   float grey = dot(color.rgb, float3(0.3, 0.59, 0.11));

   // Increase contrast by applying a color transformation based on a quasi-sigmoid gamma curve
   grey = 1 / (1 + pow(1.25-grey/2, 16) );

   // Transform Black/White color range to Black/Red/White color range
   // 1 -> 0.5f ... White -> Red
   if (grey >= 0.75)
   {
   newColor.r = 0.7 + 0.3 * color.r;
   grey = (grey - 0.75) * 4;
   newColor.gb = 0.7 * grey + 0.3 * color.gb;
   }
   else // 0.5f -> 0 ... Red -> Black
   {
   newColor.r = 1.5 * 0.7 * grey + 0.3 * color.r;
   newColor.gb = 0.3 * color.gb ;
   }

   // Within viewSize (Full transformation, only blend with timefactor)
   if (distQ < viewsizeQ)
   {
 color.rgb = lerp(newColor.rgb, color.rgb, timeFactor);
   }
   // Outside viewSize but still in fadeSize (Spatial fade-out but also with timefactor)
   else
   {
      float factor = timeFactor * (1 - (distQ  - viewsizeQ) / (fadesizeQ - viewsizeQ));
      color.rgb = lerp(newColor.rgb, color.rgb, factor);
   } 
}
like image 403
ares_games Avatar asked Oct 29 '13 17:10

ares_games


People also ask

What is Hlsl used for?

HLSL is the C-like high-level shader language that you use with programmable shaders in DirectX. For example, you can use HLSL to write a vertex shader, or a pixel shader, and use those shaders in the implementation of the renderer in your Direct3D application.

Is Hlsl a programming language?

Cg (short for C for Graphics) and High-Level Shader Language (HLSL) are two names given to a high-level shading language developed by Nvidia and Microsoft for programming shaders.

What is HLSL GLSL?

In GLSL, you apply modifiers (qualifiers) to a global shader variable declaration to give that variable a specific behavior in your shaders. In HLSL, you don't need these modifiers because you define the flow of the shader with the arguments that you pass to your shader and that you return from your shader.


2 Answers

Few bits and pieces also, you have x,y for light center + screen width /height.

Replacing by :

float2 light;
float2 screenResolution;

Then in your code:

float2 delta = (light - texCoord.xy) * screenResolution;

Should remove 2 more instructions.

Next is the use of atan2, which is likely to be the most hungry one.

You can declare another float2 (float2 vecshift), where x = cos(AngleShift) and y = sin(angleShift). Just precompute this one in CPU.

Then you can do the following (basically do a cross product to extract angle instead of using atan2):

float2 dn = normalize(delta);
float cr = dn.x *vecshift.y -dn.y * vecshift.x;
float distQ = dot(delta, delta) - sin((asin(cr))*13) *screenZoomQTimesX;

Please note than I'm not too keen on sin of asin of something, but polynomial form would not fit in your use case. I'm sure there's a much cleaner version to modulate than using sin*asin tho ))

Using ? construct instead of if/else can also (sometimes) help for your instruction count.

color.rgb = lerp(newColor.rgb, color.rgb, distQ < viewsizeQ ? timeFactor : timeFactor * (1 - (distQ  - viewsizeQ) / (fadesizeQ - viewsizeQ)));

Does reduce 2 more instructions.

Full version here, sets to 60 instructions.

// Normalized timefactor (1 = fully enabled)
float timeFactor;

float2 light;

float viewsizeQ;
float fadesizeQ;

float2 screenResolution;
float screenZoomQTimesX;

float2 vecshift;

// Texture sampler
sampler TextureSampler : register(s0);

float4 method(float2 texCoord : TEXCOORD0) : COLOR0
{
// New color after transformation
float4 newColor;

// Look up the texture color.
float4 color =tex2D(Samp, texCoord);

// Calculate distance
float2 delta = (light - texCoord.xy) * screenResolution;

float2 dn = normalize(delta);
float cr = dn.x *vecshift.y -dn.y * vecshift.x;

float distQ = dot(delta, delta) - sin((asin(cr))*13) *screenZoomQTimesX;
//float distQ = dot(delta, delta) - a13 *screenZoomQTimesX;

if (distQ < fadesizeQ)
{
   // Make greyscale
   float grey = dot(color.rgb, float3(0.3, 0.59, 0.11));

   // Increase contrast by applying a color transformation based on a quasi-sigmoid gamma curve
   grey = 1 / (1 + pow(1.25-grey/2, 16) );

   // Transform Black/White color range to Black/Red/White color range
   // 1 -> 0.5f ... White -> Red
   if (grey >= 0.75)
   {
       newColor.r = 0.7 + 0.3 * color.r;
       grey = (grey - 0.75) * 4;
       newColor.gb = 0.7 * grey + 0.3 * color.gb;
   }
   else // 0.5f -> 0 ... Red -> Black
   {
       newColor.r = 1.5 * 0.7 * grey + 0.3 * color.r;
       newColor.gb = 0.3 * color.gb ;
   }

   color.rgb = lerp(newColor.rgb, color.rgb, distQ < viewsizeQ ? timeFactor : timeFactor * (1 - (distQ  - viewsizeQ) / (fadesizeQ - viewsizeQ)));
}
return color;

}
like image 100
mrvux Avatar answered Oct 19 '22 11:10

mrvux


A couple of suggestions

  • You could use a 1D sampler (as a lookup table) for your quasi-sigmoid. If power goes from 0 to 1, then create a texture of 1 x 256 (or whatever horizontal size preserves your function best) and simply look up a value for your current power using tex1D. You will need to run this function on the CPU to fill in this texture, but it would just be done once during load time.
  • You could use the lerp function instead of spelling it out as color.rgb = /*0.7 */ factor * newColor.rgb + /*0.3 **/ (1 - factor) * color.rgb; instead, use color.rgb = lerp(newColor.rgb, color.rgb, factor); (lerp generally compiles down to an assembly instruction on most GPUs), saving you instructions.
like image 4
Ani Avatar answered Oct 19 '22 11:10

Ani