I'm trying to learn to code using intrinsics and below is a code which does addition
compiler used: icc
#include<stdio.h> #include<emmintrin.h> int main() { __m128i a = _mm_set_epi32(1,2,3,4); __m128i b = _mm_set_epi32(1,2,3,4); __m128i c; c = _mm_add_epi32(a,b); printf("%d\n",c[2]); return 0; }
I get the below error:
test.c(9): error: expression must have pointer-to-object type printf("%d\n",c[2]);
How do I print the values in the variable c
which is of type __m128i
Use this function to print them:
#include <stdint.h> #include <string.h> void print128_num(__m128i var) { uint16_t val[8]; memcpy(val, &var, sizeof(val)); printf("Numerical: %i %i %i %i %i %i %i %i \n", val[0], val[1], val[2], val[3], val[4], val[5], val[6], val[7]); }
You split 128bits into 16-bits(or 32-bits) before printing them.
This is a way of 64-bit splitting and printing if you have 64-bit support available:
#include <inttypes.h> void print128_num(__m128i var) { int64_t v64val[2]; memcpy(v64val, &var, sizeof(v64val)); printf("%.16llx %.16llx\n", v64val[1], v64val[0]); }
Note: casting the &var
directly to an int*
or uint16_t*
would also work MSVC, but this violates strict aliasing and is undefined behaviour. Using memcpy
is the standard compliant way to do the same and with minimal optimization the compiler will generate the exact same binary code.
_mm_setr_epiX
). Reverse the array indices if you prefer printing in the same order Intel's manuals use, where the most significant element is on the left (like _mm_set_epiX
). Related: Convention for displaying vector registers Using a __m128i*
to load from an array of int
is safe because the __m128
types are defined to allow aliasing just like ISO C unsigned char*
. (e.g. in gcc's headers, the definition includes __attribute__((may_alias))
.)
The reverse isn't safe (pointing an int*
onto part of a __m128i
object). MSVC guarantees that's safe, but GCC/clang don't. (-fstrict-aliasing
is on by default). It sometimes works with GCC/clang, but why risk it? It sometimes even interferes with optimization; see this Q&A. See also Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior?
(uint32_t*) &my_vector
violates the C and C++ aliasing rules, and is not guaranteed to work the way you'd expect. Storing to a local array and then accessing it is guaranteed to be safe. It even optimizes away with most compilers, so you get movq
/ pextrq
directly from xmm to integer registers instead of an actual store/reload, for example.
Source + asm output on the Godbolt compiler explorer: proof it compiles with MSVC and so on.
#include <immintrin.h> #include <stdint.h> #include <stdio.h> #ifndef __cplusplus #include <stdalign.h> // C11 defines _Alignas(). This header defines alignas() #endif void p128_hex_u8(__m128i in) { alignas(16) uint8_t v[16]; _mm_store_si128((__m128i*)v, in); printf("v16_u8: %x %x %x %x | %x %x %x %x | %x %x %x %x | %x %x %x %x\n", v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7], v[8], v[9], v[10], v[11], v[12], v[13], v[14], v[15]); } void p128_hex_u16(__m128i in) { alignas(16) uint16_t v[8]; _mm_store_si128((__m128i*)v, in); printf("v8_u16: %x %x %x %x, %x %x %x %x\n", v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7]); } void p128_hex_u32(__m128i in) { alignas(16) uint32_t v[4]; _mm_store_si128((__m128i*)v, in); printf("v4_u32: %x %x %x %x\n", v[0], v[1], v[2], v[3]); } void p128_hex_u64(__m128i in) { alignas(16) unsigned long long v[2]; // uint64_t might give format-string warnings with %llx; it's just long in some ABIs _mm_store_si128((__m128i*)v, in); printf("v2_u64: %llx %llx\n", v[0], v[1]); }
If you need portability to C99 or C++03 or earlier (i.e. without C11 / C++11), remove the alignas()
and use storeu
instead of store
. Or use __attribute__((aligned(16)))
or __declspec( align(16) )
instead.
(If you're writing code with intrinsics, you should be using a recent compiler version. Newer compilers usually make better asm than older compilers, including for SSE/AVX intrinsics. But maybe you want to use gcc-6.3 with -std=gnu++03
C++03 mode for a codebase that isn't ready for C++11 or something.)
Sample output from calling all 4 functions on
// source used: __m128i vec = _mm_setr_epi8(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16); // output: v2_u64: 0x807060504030201 0x100f0e0d0c0b0a09 v4_u32: 0x4030201 0x8070605 0xc0b0a09 0x100f0e0d v8_u16: 0x201 0x403 0x605 0x807 | 0xa09 0xc0b 0xe0d 0x100f v16_u8: 0x1 0x2 0x3 0x4 | 0x5 0x6 0x7 0x8 | 0x9 0xa 0xb 0xc | 0xd 0xe 0xf 0x10
Adjust the format strings if you want to pad with leading zeros for consistent output width. See printf(3)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With