What's the fastest way to move only the higher or lower 64 bits from an integer SSE register to another? With SSE 4.1, it can be done with a single pblendw
instruction (_mm_blend_epi16
). But what about older SSE versions? Shift and unpack? AND and OR? movsd
despite of the bypass delay?
Closely related question: Best way to shuffle 64-bit portions of two __m128i's
To move the low 64 bits from src to dst, preserving the high 64 bits of dst:
movsd dst, src
To move the high 64 bits from src to dst, preserving the low 64 bits of dst:
shufps dst, src, E4h
Bypass delays generally only add latency, not dispatch or execute or retirement resources, so they are usually only a concern when comparing otherwise equivalent sequences (i.e. if there were a single-instruction equivalent that stayed in the integer domain, you'd prefer to use it for integer arithmetic).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With