Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a movss instruction be used to replace integer data?

With the constraint that I can use only SSE and SSE2 instructions, I have a need to replace the least significant (0) element of a 4 element __m128i vector with the 0 element from another vector.

For floating point vectors, the task is simple - one can use the _mm_move_ss() intrinsic to cause the element to be replaced with the 0 element from another vector. It generates one movss instruction, so is quite efficient.

Using two casting intrinsics, it's possible to also convince the compiler to use a single SSE movss instruction to move integer data. The source code ends up looking like this:

__m128i NewVector = _mm_castps_si128(_mm_move_ss(_mm_castsi128_ps(Take3FromThisVector),
                                                 _mm_castsi128_ps(Take1FromThisVector)));

It looks a bit messy, but with a suitable amount of commenting it can be acceptable, especially since it generates a minimum of instructions. In its typical use everything's optimized to be in xmm registers.

My question is this:

Since it's a movss instruction, where the "ss" implies single precision floating point, is it okay to have it move integer data that could potentially contain some "special" or "illegal" (for floating point) combo of bits in any of the vector positions?

The obvious alternative - which I also implemented and tested - is to AND the first vector with a mask, then OR in a second vector that contains just one value in the least significant element, with all the others being zero. As you can imagine, this generates more instructions.

I've tested the casting approach I showed above and it doesn't seem to cause any problems, but I note in particular that there's no intrinsic provided that does this same operation for integer data. It seems as though Intel would have provided one if it was just as good for integer data - e.g., _mm_move_epi32 or similar. And so I'm skeptical whether this is a good idea.

I did some searches, e.g., "can a movss instruction cause a floating point exception", but did not find any information that would answer my question.

Thanks in advance for knowledge you're willing to share.

-Noel

like image 221
NoelC Avatar asked May 23 '16 01:05

NoelC


2 Answers

Yes, it's fine to use FP shuffles like movss xmm, xmm on integer data. The insn reference manual tells you that it can't raise FP numeric exceptions; only actual FP math instructions do that. So go ahead and cast.

There isn't even a bypass delay for using FP shuffles on integer data in most uarches (but there is extra latency for using integer shuffles between FP math instructions).

Agner Fog's "optimizing assembly" guide has a great section on what instructions are useful for different kinds of data movement (broadcasts, merging, etc.) See also the x86 tag wiki for more good links.


The reason there's no integer intrinsic is that the SSE2 movd integer instruction zeros the upper bytes of the destination, like movss used as a load, but unlike movss between registers.

Intel's vector instruction set known for its inconsistency and non-orthogonality, esp. the earliest versions (like SSE1). SSE4.1 filled many gaps, but there are still obvious missing pieces.

like image 158
Peter Cordes Avatar answered Oct 09 '22 23:10

Peter Cordes


The types __m128 and __m128i are interchangeable. The main reason for the cast is to make your intentions clearer (and keep your compiler happy). The cast itself does not generate any extra assembly.

The _mm_move_ss operation is described directly in terms of which bits end up in your result.

If you end up with an invalid bit combination for single-precision floats, this will only be a problem if you try to use the resulting value in floating-point calculations.

like image 39
paddy Avatar answered Oct 10 '22 00:10

paddy