Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is casting to simd-type undefined behaviour in C++? [duplicate]

In a simd-tutorial i found the following code-snippet.

void simd(float* a, int N)                                                                                                                                                                                        
{                      
// We assume N % 4 == 0.                                                                                                                                                                                        
 int nb_iters = N / 4;                                                                                                                                                                                         
 __m128* ptr = reinterpret_cast<__m128*>(a); // (*)                                                                                                                                                                                 

 for (int i = 0; i < nb_iters; ++i, ++ptr, a += 4)                                                                                                                                                              
     _mm_store_ps(a, _mm_sqrt_ps(*ptr));                                                                                                                                                                          
}   

Now my question is, is the line with (*) undefined behaviour? Due to the following spec from (https://en.cppreference.com/w/cpp/language/reinterpret_cast)

Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:

  • AliasedType and DynamicType are similar.
  • AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType.
  • AliasedType is std::byte, (since C++17)char, or unsigned char: this permits examination of the object representation of any object as an array of bytes.

How could someone prevent undefined behaviour in this case? Im aware of that i could std::memcopy, but the performance penalty would made the simd useless or am i'm wrong on this?

like image 298
user1235183 Avatar asked Nov 18 '19 08:11

user1235183


2 Answers

Edit: Please look at the answer in the duplicate (and/or Peter's answer here). What I write below is technically correct but not really relevant in practice.


Yes, that would be undefined behavior based on the C++ standard. Your compiler might still handle it correctly as an extension (seeing as SIMD types and intrinsics are not part of the C++ standard in the first place).

To do this safely and correctly without compromising speed, you would use the intrinsic for loading 4 floats directly from memory into a 128 bit register:

__m128 reg = _mm_load_ps(a);

See the Intel Intrinsics Guide for the important alignment constraint:

__m128 _mm_load_ps (float const* mem_addr)

Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory into dst. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.

like image 144
Max Langhof Avatar answered Sep 21 '22 17:09

Max Langhof


Intel's intrinsics API does define the behaviour of casting to __m128* and dereferencing: it's identical to _mm_load_ps on the same pointer.

For float* and double*, the load/store intrinsics basically exist to wrap this reinterpret cast and communicate alignment info to the compiler.

If _mm_load_ps() is supported, the implementation must also define the behaviour of the code in the question.


I don't know if this is actually documented anywhere; maybe in an Intel tutorial or whitepaper, but it's the agreed-upon behaviour of all compilers and I think most people would agree that a compiler that didn't define this behaviour didn't fully support Intel's intrinsics API.

__m128 types are defined as may_alias1, so like char* you can point a __m128* at anything, including int[] or an arbitrary struct, and load or store through it without violating strict-aliasing. (As long as it's aligned by 16, otherwise you do need _mm_loadu_ps, or a custom vector type declared with something like GNU C's aligned(1) attribute).


Footnote 1: __attribute__((vector_size(16), may_alias)) in GNU C, and MSVC doesn't do type-based alias analysis.

like image 41
Peter Cordes Avatar answered Sep 23 '22 17:09

Peter Cordes