In OpenCL, I want to store a vector (3D) using a "Shared Exponent" representation for compact storage. Typically, if you store a 3D floating point vector, you simply store 3 separate float values (or 4 when aligned properly). This requires 12 (16) bytes storage for single precision and if you don't require this accuracy you can use the "half" precision float and shrink it down to 6 (8) bytes.
When using half precision and 3 separate values, the memory looks like this (no alignment considered):
I'd like to shrink this down to 4 bytes by using a shared exponent, as OpenGL uses this in one of its internal texture formats ("RGB9_E5"). This means, the absolutely largest component decides what the exponent of the whole number is. This exponent is then used for each component implicitly. Tricks such as "normalized" storage with an implicit "1." in front of the mantissa don't work in this case. Such a representation works like this (we could tweak the acutal parameters, so this is an example):
I'd like to store this in an OpenCL uint
type (32 bits) or something equivalent (e.g. uchar4
). The question now is:
How can I convert from and into this representation to and from float3
as fast as possible?
My idea is like this, but I'm sure there is some "bit hacking" trick which uses the bit representation of IEEE floats to circumvent the floating point ALU:
uchar4
as the representative type. Store x, y, z mantisssa in x, y, z components of this uchar4
. The w component is split up into 5 less significant bits (w & 0x1F)
for the shared exponent and the three more significant bits (w >> 5) & 1
, (w >> 6) & 1
and (w >> 7) & 1
are the signs for x, y and z, respectively."Unpacking" this representation into a float3
could be done using this code:
float3 unpackCompactVector(uchar4 packed) {
float exp = (float)(packed.w & 0x1F) - 16.0;
float factor = exp2(exp) / 256.0;
float x = (float)(packed.x) * factor * (packed.w & 0x20 ? -1.0 : 1.0);
float y = (float)(packed.y) * factor * (packed.w & 0x40 ? -1.0 : 1.0);
float z = (float)(packed.z) * factor * (packed.w & 0x80 ? -1.0 : 1.0);
float3 result = { x, y, z };
return result;
}
"Packing" a float3
into this representation could be done using this code:
uchar4 packCompactVector(float3 vec) {
float xAbs = abs(vec.x); uchar xSign = vec.x < 0.0 ? 0x20 : 0;
float yAbs = abs(vec.y); uchar ySign = vec.y < 0.0 ? 0x40 : 0;
float zAbs = abs(vec.z); uchar zSign = vec.z < 0.0 ? 0x80 : 0;
float maxAbs = max(max(xAbs, yAbs), zAbs);
int exp = floor(log2(maxAbs)) + 1;
float factor = exp2(exp);
uchar xMant = floor(xAbs / factor * 256);
uchar yMant = floor(yAbs / factor * 256);
uchar zMant = floor(zAbs / factor * 256);
uchar w = ((exp + 16) & 0x1F) + xSign + ySign + zSign;
uchar4 result = { xMant, yMant, zMant, w };
return result;
}
I've put an equivalent implementation in C++ online on ideone. The test cases shows the transition from exp = 3
to exp 4
(with the bias of 16 this is encoded as 19 and 20, respectively) by encoding numbers around 8.0
.
This implementation seems to work on the first sight. But:
log2
because they are slow.Can you suggest a better way to achieve my goal?
Note that I only need an OpenCL "device code" for this, I don't need to convert between the representations in the host program. But I added the C
tag since a solution is most probably independent of the OpenCL language features (OpenCL is almost C and it also uses IEEE 754 floats, bit manipulation works the same, etc.).
If you used CL/GL interop and stored your data in an OpenGL texture in RGB9_E5 format and if you could create an OpenCL image from that texture, you could leverage the hardware texture unit to do the conversion into a float4 upon reading from the image. It might be worth trying.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With