This question is an extension of what I have asked before. However, after a period of time, I find that some of my concepts about Conversion Behavior between Two Pointers are still ambiguous.
To facilitate the discussion, I first make the following assumptions about the host implementation:
sizeof(int)
: 4, _Alignof(int)
: 4sizeof(double)
: 8, _Alignof(double)
: 8void *ptr = malloc(4096); // (A)
*(int *) ptr = 10; // (B)
/*
* Does the following line have undefined behavior
* or violate strict aliasing rules?
*/
*(((double *) ptr) + 2) = 1.618; // (C)
// now, can still read integer value with (*(int *) ptr)
In my current understanding, the answer is No.
According to [6.3.2.3 #7] of C11:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. ...
and [6.5 #7] of C11:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- ...
Therefore, in my knowledge,
int
.ptr
is correctly aligned for the double
type, the pointer casting and the pointer arithmetic is legal. Because it didn't access the first 4 Bytes, it didn't break the 6.5 #7 rule.Do I have any misunderstandings about what I have mentioned above?
void *ptr = malloc(4096); // (A)
*(int *) ptr = 10; // (B)
/*
* Does the following line have undefined behavior
* or violate strict aliasing rules?
*/
*(double *) ptr = 1.618; // (C)
// now, shall not read value with (*(int *) ptr)
In my current understanding, the answer is also No.
According to [6.5 #6] of C11:
If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.
So, in my knowledge, the line (C) is a subsequent access that modifies the stored value and updates the effective type of the first 8 Bytes to double
. Do I have any misunderstandings about what I have mentioned above?
The main confusion is not sure whether there is a violation of the [6.5 #7] rules:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- ...
The answer typically is to type pun, often the methods used violate strict aliasing rules. Sometimes we want to circumvent the type system and interpret an object as a different type. This is called type punning, to reinterpret a segment of memory as another type.
This is done because they referred to the same memory location. Strict Aliasing: GCC compiler makes an assumption that pointers of different types will never point to the same memory location i.e., alias of each other. Strict aliasing rule helps the compiler to optimize the code.
Pointer aliasing is a hidden kind of data dependency that can occur in C, C++, or any other language that uses pointers for array addresses in arithmetic operations. Array data identified by pointers in C can overlap, because the C language puts very few restrictions on pointers.
The strict aliasing rule was introduced to give the compiler vendors some leeway regarding optimizations. By default, the compiler assumes that pointers to (loosely speaking) incompatible types never alias. As a consequence, you, the programmer, have to make sure that this rule is obeyed.
First of all, we have to clarify what “aliasing” really means, or rather aliasing of pointers. Take a look at this example: int* p1 = &value; // p1 points to 'value'. int* p2 = &value; // p2 as well... Here, ‘p1’ and ‘p2’ are aliased to the same object ‘value’; that is, they point to the same object. If you update ‘value’ through ‘p1’:
Note: Since, both C and C++ allow casting between pointer types, which will eventually create aliases and thus, violate the compiler’s assumption.
The pointed-at types are different, but the pointed-at type through which the access is made is a pointer to character: unsigned char a1 = p[0]; // First byte of 'f'. unsigned char a4 = p[3]; // Last byte of 'f'. Conversely, aliased pointer access is not defined if the pointed-at types are fundamentally different.
To facilitate the discussion, I first make the following assumptions about the host implementation [...]
These assumptions are almost completely irrelevant. The only constraint that matters for the particular questions posed is that sizeof(int) <= 2 * sizeof(double)
.
In particular, malloc()
is guaranteed to allocate a block that is suitably aligned for any built-in type.
Question One:
Your analysis is correct: there is no strict-aliasing violation.
Question Two:
According to [6.5 #6] of C11:
If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the
type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.
So, in my knowledge, the line (C) is a subsequent access that modifies the stored value and updates the effective type of the first 8 Bytes to double.
Yes, line (C) modifies the stored value of *(double *) ptr
, and although ptr
has a declared type, the object designated by *(double *) ptr
, being part of a dynamically allocated block, does not. Therefore, by paragraph 6.5/6, the effective type of the object designated by *(double *) ptr
becomes the type of the expression *(double *) ptr
(that is, double
) including for that access itself. The exception at the end of the paragraph serves to avoid a conflict between that and the effect of the access at your (B).
Thus, there is no strict-aliasing violation at (C). The lvalue used for access is *(double *)ptr
. Its type is double
, and according to 6.5/6, that is also the effective type of the object being accessed, notwithstanding any other effective type that that object or any part of it may have had. This satisfies the first alternative of the SAR.
While other answers do a reasonable job describing what the Standard would seem to say, both clang and gcc appear to interpret the phrase "subsequent accesses that do not modify the stored value" as though it said "subsequent accesses that do not change the stored bit pattern in a way which will later be observed". Both compilers are prone to take the sequence:
as exemplified by the code:
typedef long long longish;
__attribute((noinline))
long test(long *p, int index, int index2, int index3)
{
if (sizeof (long) != sizeof (longish))
return -1;
p[index] = 1; // Step 1
((longish*)p)[index2] = 2; // Step 2
longish temp2 = ((longish*)p)[index3]; // Step 3
p[index3] = 5; // Step 4
p[index3] = temp2; // Step 5
return p[index]; // Step 6
}
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
long *arr = malloc(sizeof (long));
long temp = test(arr, 0, 0, 0);
printf("%ld should equal %ld\n", temp, arr[0]);
free(arr);
}
and optimize out the write in step #4 (the bit pattern written here will never be observed, since it's overwritten by step #5), as well as the write in step #5 (once the write in step #4 is removed, the write in step #5 will no longer change the bit pattern). Once those writes are removed, the compilers will then assume that since no object of type T has been used to modify the object, they may optimize out the read in step #6. They will do this even if the references should be recognizable as being freshly derived, at each point of use, from a common pointer.
I see nothing in the Standard's terminology that would suggest that such an interpretation is valid or reasonable, but the maintainers of clang and gcc have known for years that they do not handle this corner case and so far as I can tell have made no attempt to accommodate the possibility that step 2 might legitimately overwrite the value written in step 1 if step 3 reads that bit pattern as a U and step 5 writes it as a T.
For question 1, there's no problem since you access a different object with no declared type. In both the int
and double
case, then "the type of the lvalue becomes the
effective type of the object for that access".
For question 2, it says:
If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.
Allocated storage has no declared type, you do access it through int
but then later you do a modification through double
. *((double *) ptr) = 1.618;
isn't likely some read-modify-write - it's just a write (such concepts aren't even defined by C).
One perfectly sensible interpretation then is then that "for subsequent accesses that do not modify" does not apply and we should instead regarding it as a new lvalue access with a different effective type. If reading it all quite literally, there wouldn't be any strict aliasing violation.
But it's all ambiguous though; you may as well read this as: the compiler should keep track of all effective types internally and when you do an access through a non-compatible type or attempt to modify with a non-compatible type after the object with no declared type previously got an effective type, then that's UB.
This part of the standard 6.5/6 and /7 is simply not clear.
Practically, regardless of what the standard says, we can also see that the mainstream compilers do run off into the undefined behavior woods when we try this code with optimizations on:
#include <stdlib.h>
#include <stdio.h>
int main (void)
{
void *ptr = malloc(4096); // (A)
*((int *) ptr) = 10; // (B)
/*
* Does the following line have undefined behavior
* or violate strict aliasing rules?
*/
*((double *) ptr) = 1.618; // (C)
if( *((int *) ptr) == 10 )
puts("Value didn't change.");
}
https://godbolt.org/z/jhxj7WqKW
-O3
then the behavior changes.mov
instructions despite optimizations and check the contents, then doesn't print anything.3 different behaviors from 3 compilers, using the same code and same compiler options... So in practice, we must simply refer from fishy pointer conversions like this, because 22 years after C99, the compilers are still implementing strict aliasing in broken ways and I don't blame them since the standard is so ambiguously written.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With