This question is an extension of what I have asked before. However, after a period of time, I find that some of my concepts about Conversion Behavior between Two Pointers are still ambiguous. To facilitate the discussion, I first make the following assumptions about the host implementation: <ul> <li>malloc: 8-aligned</li> <li> <code>sizeof(int)</code>: 4, <code>_Alignof(int)</code>: 4</li> <li> <code>sizeof(double)</code>: 8, <code>_Alignof(double)</code>: 8</li> </ul> <hr> <h3>Question one:</h3> <pre class="prettyprint"><code>void *ptr = malloc(4096); // (A) *(int *) ptr = 10; // (B) /* * Does the following line have undefined behavior * or violate strict aliasing rules? */ *(((double *) ptr) + 2) = 1.618; // (C) // now, can still read integer value with (*(int *) ptr) </code></pre> In my current understanding, the answer is No. According to [6.3.2.3 #7] of C11: <blockquote> A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undeﬁned. ... </blockquote> and [6.5 #7] of C11: <blockquote> An object shall have its stored value accessed only by an lvalue expression that has one of the following types: <ul> <li>a type compatible with the effective type of the object,</li> <li>...</li> </ul> </blockquote> Therefore, in my knowledge, <ul> <li>After line (A), I allocated an object that has no declared type and didn't yet have the effective type.</li> <li>After line (B), the first 4 Bytes of the allocated object already have the effective type: <code>int</code>.</li> <li>for line (C), the <code>ptr</code> is correctly aligned for the <code>double</code> type, the pointer casting and the pointer arithmetic is legal. Because it didn't access the first 4 Bytes, it didn't break the 6.5 #7 rule.</li> </ul> Do I have any misunderstandings about what I have mentioned above? <hr> <h3>Question two:</h3> <pre class="prettyprint"><code>void *ptr = malloc(4096); // (A) *(int *) ptr = 10; // (B) /* * Does the following line have undefined behavior * or violate strict aliasing rules? */ *(double *) ptr = 1.618; // (C) // now, shall not read value with (*(int *) ptr) </code></pre> In my current understanding, the answer is also No. According to [6.5 #6] of C11: <blockquote> If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. </blockquote> So, in my knowledge, the line (C) is a subsequent access that modifies the stored value and updates the effective type of the first 8 Bytes to <code>double</code>. Do I have any misunderstandings about what I have mentioned above? The main confusion is not sure whether there is a violation of the [6.5 #7] rules: <blockquote> An object shall have its stored value accessed only by an lvalue expression that has one of the following types: <ul> <li>a type compatible with the effective type of the object,</li> <li>...</li> </ul> </blockquote>

While other answers do a reasonable job describing what the Standard would seem to say, both clang and gcc appear to interpret the phrase "subsequent accesses that do not modify the stored value" as though it said "subsequent accesses that do not change the stored bit pattern in a way which will later be observed". Both compilers are prone to take the sequence: <ol> <li>Write storage with a T of value X using reference 1</li> <li>Write storage with a U of value Y using reference 2</li> <li>Read storage as type U using reference 3</li> <li>Optionally write storage with a T of some arbitrary value, using reference 3</li> <li>Write storage with a T whose bit pattern matches what was read in step #3, using reference 3</li> <li>Read the storage as type T using reference 1</li> </ol> as exemplified by the code: <pre class="prettyprint"><code>typedef long long longish; __attribute((noinline)) long test(long *p, int index, int index2, int index3) { if (sizeof (long) != sizeof (longish)) return -1; p[index] = 1; // Step 1 ((longish*)p)[index2] = 2; // Step 2 longish temp2 = ((longish*)p)[index3]; // Step 3 p[index3] = 5; // Step 4 p[index3] = temp2; // Step 5 return p[index]; // Step 6 } #include <stdio.h> #include <stdlib.h> int main(void) { long *arr = malloc(sizeof (long)); long temp = test(arr, 0, 0, 0); printf("%ld should equal %ld\n", temp, arr[0]); free(arr); } </code></pre> and optimize out the write in step #4 (the bit pattern written here will never be observed, since it's overwritten by step #5), as well as the write in step #5 (once the write in step #4 is removed, the write in step #5 will no longer change the bit pattern). Once those writes are removed, the compilers will then assume that since no object of type T has been used to modify the object, they may optimize out the read in step #6. They will do this even if the references should be recognizable as being freshly derived, at each point of use, from a common pointer. I see nothing in the Standard's terminology that would suggest that such an interpretation is valid or reasonable, but the maintainers of clang and gcc have known for years that they do not handle this corner case and so far as I can tell have made no attempt to accommodate the possibility that step 2 might legitimately overwrite the value written in step 1 if step 3 reads that bit pattern as a U and step 5 writes it as a T.

C - Do incompatible pointers used for arithmetic violate strict aliasing?

Question one:

void *ptr = malloc(4096);        // (A)

*(int *) ptr = 10;               // (B)               

/*
 * Does the following line have undefined behavior
 * or violate strict aliasing rules?
 */
*(((double *) ptr) + 2) = 1.618; // (C)

// now, can still read integer value with (*(int *) ptr)

In my current understanding, the answer is No.

According to [6.3.2.3 #7] of C11:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undeﬁned. ...

and [6.5 #7] of C11:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

a type compatible with the effective type of the object,

...

Therefore, in my knowledge,

After line (A), I allocated an object that has no declared type and didn't yet have the effective type.
After line (B), the first 4 Bytes of the allocated object already have the effective type: int.
for line (C), the ptr is correctly aligned for the double type, the pointer casting and the pointer arithmetic is legal. Because it didn't access the first 4 Bytes, it didn't break the 6.5 #7 rule.

Do I have any misunderstandings about what I have mentioned above?

Question two:

void *ptr = malloc(4096);        // (A)

*(int *) ptr = 10;               // (B)

/*
 * Does the following line have undefined behavior
 * or violate strict aliasing rules?
 */
*(double *) ptr = 1.618;        // (C)

// now, shall not read value with (*(int *) ptr)

In my current understanding, the answer is also No.

According to [6.5 #6] of C11:

If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.

So, in my knowledge, the line (C) is a subsequent access that modifies the stored value and updates the effective type of the first 8 Bytes to double. Do I have any misunderstandings about what I have mentioned above?

The main confusion is not sure whether there is a violation of the [6.5 #7] rules:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

a type compatible with the effective type of the object,

...

766

asked Apr 15 '21 14:04

Richard Bryant

3 Answers

To facilitate the discussion, I first make the following assumptions about the host implementation [...]

These assumptions are almost completely irrelevant. The only constraint that matters for the particular questions posed is that sizeof(int) <= 2 * sizeof(double).

In particular, malloc() is guaranteed to allocate a block that is suitably aligned for any built-in type.

Question One:

Your analysis is correct: there is no strict-aliasing violation.

Question Two:

According to [6.5 #6] of C11:
If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the
type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.
So, in my knowledge, the line (C) is a subsequent access that modifies the stored value and updates the effective type of the first 8 Bytes to double.

Yes, line (C) modifies the stored value of *(double *) ptr, and although ptr has a declared type, the object designated by *(double *) ptr, being part of a dynamically allocated block, does not. Therefore, by paragraph 6.5/6, the effective type of the object designated by *(double *) ptr becomes the type of the expression *(double *) ptr (that is, double) including for that access itself. The exception at the end of the paragraph serves to avoid a conflict between that and the effect of the access at your (B).

Thus, there is no strict-aliasing violation at (C). The lvalue used for access is *(double *)ptr. Its type is double, and according to 6.5/6, that is also the effective type of the object being accessed, notwithstanding any other effective type that that object or any part of it may have had. This satisfies the first alternative of the SAR.

193

answered Oct 25 '22 23:10

John Bollinger

While other answers do a reasonable job describing what the Standard would seem to say, both clang and gcc appear to interpret the phrase "subsequent accesses that do not modify the stored value" as though it said "subsequent accesses that do not change the stored bit pattern in a way which will later be observed". Both compilers are prone to take the sequence:

Write storage with a T of value X using reference 1
Write storage with a U of value Y using reference 2
Read storage as type U using reference 3
Optionally write storage with a T of some arbitrary value, using reference 3
Write storage with a T whose bit pattern matches what was read in step #3, using reference 3
Read the storage as type T using reference 1

as exemplified by the code:

typedef long long longish;
__attribute((noinline))
long test(long *p, int index, int index2, int index3)
{
    if (sizeof (long) != sizeof (longish))
        return -1;

    p[index] = 1;                          // Step 1
    ((longish*)p)[index2] = 2;             // Step 2
    longish temp2 = ((longish*)p)[index3]; // Step 3
    p[index3] = 5;                         // Step 4
    p[index3] = temp2;                     // Step 5
    return p[index];                       // Step 6
}
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
    long *arr = malloc(sizeof (long));
    long temp = test(arr, 0, 0, 0);
    printf("%ld should equal %ld\n", temp, arr[0]);
    free(arr);
}

and optimize out the write in step #4 (the bit pattern written here will never be observed, since it's overwritten by step #5), as well as the write in step #5 (once the write in step #4 is removed, the write in step #5 will no longer change the bit pattern). Once those writes are removed, the compilers will then assume that since no object of type T has been used to modify the object, they may optimize out the read in step #6. They will do this even if the references should be recognizable as being freshly derived, at each point of use, from a common pointer.

I see nothing in the Standard's terminology that would suggest that such an interpretation is valid or reasonable, but the maintainers of clang and gcc have known for years that they do not handle this corner case and so far as I can tell have made no attempt to accommodate the possibility that step 2 might legitimately overwrite the value written in step 1 if step 3 reads that bit pattern as a U and step 5 writes it as a T.

answered Oct 25 '22 23:10

supercat

For question 1, there's no problem since you access a different object with no declared type. In both the int and double case, then "the type of the lvalue becomes the effective type of the object for that access".

For question 2, it says:

If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.

Allocated storage has no declared type, you do access it through int but then later you do a modification through double. *((double *) ptr) = 1.618; isn't likely some read-modify-write - it's just a write (such concepts aren't even defined by C).

One perfectly sensible interpretation then is then that "for subsequent accesses that do not modify" does not apply and we should instead regarding it as a new lvalue access with a different effective type. If reading it all quite literally, there wouldn't be any strict aliasing violation.

But it's all ambiguous though; you may as well read this as: the compiler should keep track of all effective types internally and when you do an access through a non-compatible type or attempt to modify with a non-compatible type after the object with no declared type previously got an effective type, then that's UB.

This part of the standard 6.5/6 and /7 is simply not clear.

Practically, regardless of what the standard says, we can also see that the mainstream compilers do run off into the undefined behavior woods when we try this code with optimizations on:

#include <stdlib.h>
#include <stdio.h>

int main (void)
{
    void *ptr = malloc(4096);        // (A)

    *((int *) ptr) = 10;             // (B)

    /*
    * Does the following line have undefined behavior
    * or violate strict aliasing rules?
    */
    *((double *) ptr) = 1.618;       // (C)

   if( *((int *) ptr) == 10  )
     puts("Value didn't change.");
}

https://godbolt.org/z/jhxj7WqKW

gcc x86 says "Value didn't change." Until we drop -O3 then the behavior changes.
clang x86 doesn't generate a program since it thinks the value changed.
icc generates mov instructions despite optimizations and check the contents, then doesn't print anything.

3 different behaviors from 3 compilers, using the same code and same compiler options... So in practice, we must simply refer from fishy pointer conversions like this, because 22 years after C99, the compilers are still implementing strict aliasing in broken ways and I don't blame them since the standard is so ambiguously written.

answered Oct 26 '22 00:10

Lundin

Related questions
                            
                                What do i64 and i32 at the end of the values in limits.h mean?
                            
                                What exactly is Datum in PostgreSQL C Language functions?
                            
                                Accessing two discontinuous memory blocks as a single continuous block, in C?
                            
                                What do f_bsize and f_frsize in struct statvfs stand for?
                            
                                How to get execution time of c program?
                            
                                Subtracting NULL pointer from a normal pointer generates arithmetic right shift
                            
                                How to get the gcc compiler to not optimize a standard library function call like printf?
                            
                                Draw border (frame) using xlib
                            
                                Inconsistent C99 support in gcc and clang
                            
                                Reliable type-punning across C and C++ standards
                            
                                When returning the difference between pointers of char strings, how important is the order of casting and dereferencing?
                            
                                How to distinguish armhf (ARMv7) and armel (ARMv4) in C code?
                            
                                Where should function attributes go?
                            
                                How does this macro detect alignment issues?
                            
                                When should I use hypot over sqrtl?
                            
                                How do I print a floating-point value for later scanning with perfect accuracy?
                            
                                GDB shows incorrect arguments of functions for stack frames
                            
                                How to prove the functionality of a C stringCompare function with Frama-C?
                            
                                How to make OpenSSL C server only support TLS 1.3?
                            
                                How is it possible to create an array using register storage class in C?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C - Do incompatible pointers used for arithmetic violate strict aliasing?

Tags:

c

pointers

language-lawyer

strict-aliasing