Flipping sign on packed SSE floats

Question

I'm looking for the most efficient method of flipping the sign on all four floats packed in an SSE register.

I have not found an intrinsic for doing this in the Intel Architecture software dev manual. Below are the things I've already tried.

For each case I looped over the code 10 billion times and got the wall-time indicated. I'm trying to at least match 4 seconds it takes my non-SIMD approach, which is using just the unary minus operator.

[48 sec]
_mm_sub_ps( _mm_setzero_ps(), vec );

[32 sec]
_mm_mul_ps( _mm_set1_ps( -1.0f ), vec );

[9 sec]

union NegativeMask {
    int   intRep;
    float fltRep;
} negMask;
negMask.intRep = 0x80000000;

_mm_xor_ps( _mm_set1_ps( negMask.fltRep ), vec );

The compiler is gcc 4.2 with -O3. The CPU is an Intel Core 2 Duo.

LiraNuna · Accepted Answer

That union is not really needed, best of all worlds (readability, speed and portability):

_mm_xor_ps(vec, _mm_set1_ps(-0.f))

Flipping sign on packed SSE floats

Tags:

performance

c

optimization

simd

sse

nsanders

1 Answers

LiraNuna

Recent Activity

Donate For Us

Flipping sign on packed SSE floats

Tags:

performance

c

optimization

simd

sse

nsanders

1 Answers

LiraNuna

Related questions

Recent Activity

Donate For Us