How do you populate an x86 XMM register with 4 identical floats from another XMM register entry?

I'm trying to implement some inline assembler (in C/C++ code) to take advantage of SSE. I'd like to copy and duplicate values (from an XMM register, or from memory) to another XMM register. For example, suppose I have some values {1, 2, 3, 4} in memory. I'd like to copy these values such that xmm1 is populated with {1, 1, 1, 1}, xmm2 with {2, 2, 2, 2}, and so on and so forth.

Looking through the Intel reference manuals, I couldn't find an instruction to do this. Do I just need to use a combination of repeated MOVSS and rotates (via PSHUFD?)?

How many XMM registers are there?

There are eight XMM registers available in non -64-bit modes and 16 XMM registers in long mode, which allow simultaneous operations on: 16 bytes.

What is XMM x86?

XMM, registers of x86 microprocessors with Streaming SIMD Extensions. Extended memory manager, in the Extended Memory Specification. Intel XMM modems, in mobile devices.

What is an XMM register?

XMM registers, instead, are a completely separate registers set, introduced with SSE and still widely used to this day. They are 128 bit wide, with instructions that can treat them as arrays of 64, 32 (integer and floating point),16 or 8 bit (integer only) values. You have 8 of them in 32 bit mode, 16 in 64 bit.

If your values are 16 byte aligned in memory:

movdqa    (mem),    %xmm1
pshufd    $0xff,    %xmm1,    %xmm4
pshufd    $0xaa,    %xmm1,    %xmm3
pshufd    $0x55,    %xmm1,    %xmm2
pshufd    $0x00,    %xmm1,    %xmm1

If not, you can do an unaligned load, or four scalar loads. On newer platforms, the unaligned load should be faster; on older platforms the scalar loads may win.

As others have noted, you can also use shufps.

There are two ways:

Use shufps exclusively:

__m128 first = ...;
__m128 xxxx = _mm_shuffle_ps(first, first, 0x00); // _MM_SHUFFLE(0, 0, 0, 0)
__m128 yyyy = _mm_shuffle_ps(first, first, 0x55); // _MM_SHUFFLE(1, 1, 1, 1)
__m128 zzzz = _mm_shuffle_ps(first, first, 0xAA); // _MM_SHUFFLE(2, 2, 2, 2)
__m128 wwww = _mm_shuffle_ps(first, first, 0xFF); // _MM_SHUFFLE(3, 3, 3, 3)

Let the compiler choose the best way using _mm_set1_ps and _mm_cvtss_f32:

__m128 first = ...;
__m128 xxxx = _mm_set1_ps(_mm_cvtss_f32(first));

Note that the 2nd method will produce horrible code on MSVC, as discussed here, and will only produce 'xxxx' as result, unlike the first option.

I'm trying to implement some inline assembler (in C/C++ code) to take advantage of SSE

This is highly unportable. Use intrinsics.

How do you populate an x86 XMM register with 4 identical floats from another XMM register entry?

Tags:

c++

c

x86

sse

inline-assembly

jbl

People also ask

2 Answers

Stephen Canon

LiraNuna

Recent Activity

Donate For Us

How do you populate an x86 XMM register with 4 identical floats from another XMM register entry?

Tags:

c++

c

x86

sse

inline-assembly

jbl

People also ask

2 Answers

Stephen Canon

LiraNuna

Related questions

Recent Activity

Donate For Us