I want to multiply the data stored in one xmm register with a single float value and save the result in a xmm register. I made a little graphic to explain it a bit better.
As you see I got a xmm0 register with my data in it. For example it contains:
xmm0 = |4.0|2.5|3.5|2.0|
Each floating point is stored in 4 bytes. My xmm0 register is 128 bits, 16 bytes long.
That works pretty good. Now I want to store 0.5 in another xmm register, e.g. xmm1, and multiply this register with the xmm0 register so that each value stored in xmm0 is multiplied with 0.5.
I have absolutely no idea how to store 0.5 in an XMM register. Any suggestions?
Btw: It's Inline Assembler in C++.
void filter(image* src_image, image* dst_image)
{
float* src = src_image->data;
float* dst = dst_image->data;
__asm__ __volatile__ (
"movaps (%%esi), %%xmm0\n"
// Multiply %xmm0 with a float, e.g. 0.5
"movaps %%xmm0, (%%edi)\n"
:
: "S"(src), "D"(dst) :
);
}
This is the quiet simple version of the thing i want to do. I got some image data stored in a float array. The pointer to these arrays are passed to assembly. movaps takes the first 4 float values of the array, stores these 16 bytes in the xmm0 register. After this xmm0 should be multiplied with e.g. 0.5. Than the "new" values shall be stored in the array from edi.
As people noted in comments, for this sort of very simple operation, it's essentially always better to use intrinsics:
void filter(image* src_image, image* dst_image)
{
const __m128 data = _mm_load_ps(src_image->data);
const __m128 scaled = _mm_mul_ps(data, _mm_set1_ps(0.5f));
_mm_store_ps(dst_image->data, scaled);
}
You should only resort to an inline ASM if the compiler is generating bad code (and only after filing a bug with the compiler vendor).
If you really want to stay in assembly, there are many ways to accomplish this task. You could define a scale vector outside of the ASM block:
const __m128 half = _mm_set1_ps(0.5f);
and then use it inside the ASM just like you use other operands.
You can do it without any loads, if you really want to:
"mov $0x3f000000, %%eax\n" // encoding of 0.5
"movd %%eax, %%xmm1\n" // move to xmm1
"shufps $0, %%xmm1, %%xmm1\n" // splat across all lanes of xmm1
Those are just two approaches. There are lots of other ways. You might spend some quality time with the Intel Instruction Set Reference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With