Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unaligned access through reinterpret_cast

I'm in the middle of a discussion trying to figure out whether unaligned access is allowable in C++ through reinterpret_cast. I think not, but I'm having trouble finding the right part(s) of the standard which confirm or refute that. I have been looking at C++11, but I would be okay with another version if it is more clear.

Unaligned access is undefined in C11. The relevant part of the C11 standard (§ 6.3.2.3, paragraph 7):

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.

Since the behavior of an unaligned access is undefined, some compilers (at least GCC) take that to mean that it is okay to generate instructions which require aligned data. Most of the time the code still works for unaligned data because most x86 and ARM instructions these days work with unaligned data, but some don't. In particular, some vector instructions don't, which means that as the compiler gets better at generating optimized instructions code which worked with older versions of the compiler may not work with newer versions. And, of course, some architectures (like MIPS) don't do as well with unaligned data.

C++11 is, of course, more complicated. § 5.2.10, paragraph 7 says:

An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is static_cast<cv T2*>(static_cast<cv void*>(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1, or if either type is void. Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. The result of any other such pointer conversion is unspecified.

Note that the last word is "unspecified", not "undefined". § 1.3.25 defines "unspecified behavior" as:

behavior, for a well-formed program construct and correct data, that depends on the implementation

[Note: The implementation is not required to document which behavior occurs. The range of possible behaviors is usually delineated by this International Standard. — end note]

Unless I'm missing something, the standard doesn't actually delineate the range of possible behaviors in this case, which seems to indicate to me that one very reasonable behavior is that which is implemented for C (at least by GCC): not supporting them. That would mean the compiler is free to assume unaligned accesses do not occur and emit instructions which may not work with unaligned memory, just like it does for C.

The person I'm discussing this with, however, has a different interpretation. They cite § 1.9, paragraph 5:

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

Since there is no undefined behavior, they argue that the C++ compiler has no right to assume unaligned access don't occur.

So, are unaligned accesses through reinterpret_cast safe in C++? Where in the specification (any version) does it say?

Edit: By "access", I mean actually loading and storing. Something like

void unaligned_cp(void* a, void* b) {
  *reinterpret_cast<volatile uint32_t*>(a) =
    *reinterpret_cast<volatile uint32_t*>(b);
}

How the memory is allocated is actually outside my scope (it is for a library which can be called with data from anywhere), but malloc and an array on the stack are both likely candidates. I don't want to place any restrictions on how the memory is allocated.

Edit 2: Please cite sources (i.e., the C++ standard, section and paragraph) in answers.

like image 790
nemequ Avatar asked Sep 15 '15 14:09

nemequ


People also ask

Is it safe to use Reinterpret_cast?

reinterpret_cast is a very special and dangerous type of casting operator. And is suggested to use it using proper data type i.e., (pointer data type should be same as original data type).

Can Reinterpret_cast throw exception?

No. It is a purely compile-time construct. It is very dangerous, because it lets you get away with very wrong conversions.

Is Reinterpret_cast portable?

Anyway, the consequence of this is, that reinterpret_cast<> is portable as long as you do not rely on the byte order in any way. Your example code does not rely on byte order, it treats all bytes the same (setting them to zero), so that code is portable.


Video Answer


2 Answers

Looking at 3.11/1:

Object types have alignment requirements (3.9.1, 3.9.2) which place restrictions on the addresses at which an object of that type may be allocated.

There's some debate in comments about exactly what constitutes allocating an object of a type. However I believe the following argument works regardless of how that discussion is resolved:

Take *reinterpret_cast<uint32_t*>(a) for example. If this expression does not cause UB, then (according to the strict aliasing rule) there must be an object of type uint32_t (or int32_t) at the given location after this statement. Whether the object was already there, or this write created it, does not matter.

According to the above Standard quote, objects with alignment requirement can only exist in a correctly aligned state.

Therefore any attempt to create or write an object that is not correctly aligned causes UB.

like image 160
M.M Avatar answered Oct 12 '22 00:10

M.M


All of your quotes are about the pointer value, not the act of dereferencing.

5.2.10, paragraph 7 says that, assuming int has a stricter alignment than char, then the round trip of char* to int* to char* generates an unspecified value for the resulting char*.

On the other hand, if you convert int* to char* to int*, you are guaranteed to get back the exact same pointer as you started with.

It doesn't talk about what you get when you dereference said pointer. It simply states that in one case, you must be able to round trip. It washes its hands of the other way around.


Suppose you have some ints, and alignof(int) > 1:

int some_ints[3] ={0};

then you have an int pointer that is offset:

int* some_ptr = (int*)(((char*)&some_ints[0])+1);

We'll presume that copying this misaligned pointer doesn't cause undefined behavior for now.

The value of some_ptr is not specified by the standard. We'll be generous and presume it actually points to some chunk of bytes within some_bytes.

Now we have a int* that points to somewhere an int cannot be allocated (3.11/1). Under (3.8) the use of a pointer to an int is restricted in a number of ways. Usual use is restricted to a pointer to an T whose lifetime has begun allocated properly (/3). Some limited use is permitted on a pointer to a T which has been allocated properly, but whose lifetime has not begun (/5 and /6).

There is no way to create an int object that does not obey the alignment restrictions of int in the standard.

So the theoretical int* which claims to point to a misaligned int does not point to an int. No restrictions are placed on the behavior of said pointer when dereferenced; usual dereferencing rules provide behavior of a valid pointer to an object (including an int) and how it behaves.


And now our other assumptions. No restrictions on the value of some_ptr here are made by the standard: int* some_ptr = (int*)(((char*)&some_ints[0])+1);.

It is not a pointer to an int, much like (int*)nullptr is not a pointer to an int. Round tripping it back to a char* results in a pointer with unspecified value (it could be 0xbaadf00d or nullptr) explicitly in the standard.

The standard defines what you must do. There are (nearly? I guess evaluating it in a boolean context must return a bool) no requirements placed on the behavior of some_ptr by the standard, other than converting it back to char* results in an unspecified value (of the pointer).

like image 44
Yakk - Adam Nevraumont Avatar answered Oct 11 '22 23:10

Yakk - Adam Nevraumont