In the last couple of years, I've been doing a lot of SIMD programming and most of the time I've been relying on compiler intrinsic functions (such as the ones for SSE programming) or on programming assembly to get to the really nifty stuff. However, up until now I've hardly been able to find any programming language with built-in support for SIMD.
Now obviously there are the shader languages such as HLSL, Cg and GLSL that have native support for this kind of stuff however, I'm looking for something that's able to at least compile to SSE without autovectorization but with built-in support for vector operations. Does such a language exist?
This is an example of (part of) a Cg shader that does a spotlight and in terms of syntax this is probably the closest to what I'm looking for.
float4 pixelfunction(
output_vs IN,
uniform sampler2D texture : TEX0,
uniform sampler2D normals : TEX1,
uniform float3 light,
uniform float3 eye ) : COLOR
{
float4 color = tex2D( texture, IN.uv );
float4 normal = tex2D( normals, IN.uv ) * 2 - 1;
float3 T = normalize(IN.T);
float3 B = normalize(IN.B);
float3 N =
normal.b * normalize(IN.normal) +
normal.r * T +
normal.g * B;
float3 V = normalize(eye - IN.pos.xyz);
float3 L = normalize(light - IN.pos);
float3 H = normalize(L + V);
float4 diffuse = color * saturate( dot(N, L) );
float4 specular = color * pow(saturate(dot(N, H)), 15);
float falloff = dot(L, normalize(light));
return pow(falloff, 5) * (diffuse + specular);
}
Stuff that would be a real must in this language is:
One approach to leverage vector hardware are SIMD intrinsics, available in all modern C or C++ compilers. SIMD stands for “single Instruction, multiple data”. SIMD instructions are available on many platforms, there's a high chance your smartphone has it too, through the architecture extension ARM NEON.
SIMD is short for Single Instruction/Multiple Data, while the term SIMD operations refers to a computing method that enables processing of multiple data with a single instruction. In contrast, the conventional sequential approach using one instruction to process each individual data is called scalar operations.
Wireless MMX Technology The Wireless MMX unit is an example of a SIMD coprocessor. It is a 64-bit architecture that is an extension of the XScale microarchitecture programming model. Wireless MMX technology defines three packed data types (8-bit byte, 16-bit half word, and 32-bit word) and the 64-bit double word.
simd provides types and functions for small vector and matrix computations. The types include integer and floating-point vectors and matrices, and the functions provide basic arithmetic operations, element-wise mathematical operations, and geometric and linear algebra operations.
Your best bet is probably OpenCL. I know it has mostly been hyped as a way to run code on GPUs, but OpenCL kernels can also be compiled and run on CPUs. OpenCL is basically C with a few restrictions:
and a bunch of additions. In particular vector types:
float4 x = float4(1.0f, 2.0f, 3.0f, 4.0f);
float4 y = float4(10.0f, 10.0f, 10.0f, 10.0f);
float4 z = y + x.s3210 // add the vector y with a swizzle of x that reverses the element order
On big caveat is that the code has to be cleanly sperable, OpenCL can't call out to arbitrary libraries, etc. But if your compute kernels are reasonably independent then you basically get a vector enhanced C where you don't need to use intrinsics.
Here is a quick reference/cheatsheet with all of the extensions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With