Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C - Conversion behavior between two pointers

Update 2020-12-11: Thanks @"Some programmer dude" for the suggestion in the comment. My underlying problem is that our team is implementing a dynamic type storage engine. We allocate multiple char array[PAGE_SIZE] buffers with 16-aligned to store dynamic types of data (there is no fixed struct). For efficiency reasons, we cannot perform byte encoding or allocate additional space to use memcpy.

Since the alignment has been determined (i.e., 16), the rest is to use the cast of pointer to access objects of the specified type, for example:

int main() {
    // simulate our 16-aligned malloc
    _Alignas(16) char buf[4096];

    // store some dynamic data:
    *((unsigned long *) buf) = 0xff07;
    *(((double *) buf) + 2) = 1.618;
}

But our team disputes whether this operation is undefined behavior.


I have read many similar questions, such as

  • Why does -Wcast-align not warn about cast from char* to int* on x86?
  • How to cast char array to int at non-aligned position?
  • C undefined behavior. Strict aliasing rule, or incorrect alignment?
  • SEI CERT C C.S EXP36-C

But these are different from my interpretation of the C standard, I want to know if it’s my misunderstanding.

The main confusion is about the section 6.3.2.3 #7 of C11:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned 68) for the referenced type, the behavior is undefined.

68) In general, the concept ‘‘correctly aligned’’ is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.

Does the resulting pointer here refer to Pointer Object or Pointer Value?

In my opinion, I think the answer is the Pointer Object, but more answers seem to indicate the Pointer Value.


Interpretation A: Pointer Object

My thoughts are as follows: A pointer itself is an object. According to 6.2.5 #28, different pointer may have different representation and alignment requirements. Therefore, according to 6.3.2.3 #7, as long as two pointers have the same alignment, they can be safely converted without undefined behavior, but there is no guarantee that they can be dereferenced. Express this idea in a program:

#include <stdio.h>

int main() {
    char buf[4096];

    char *pc = buf;
    if (_Alignof(char *) == _Alignof(int *)) {
        // cast safely, because they have the same alignment requirement?
        int *pi = (int *) pc; 
        printf("pi: %p\n", pi);
    } else {
        printf("char * and int * don't have the same alignment.\n");
    }
}

Interpretation B: Pointer Value

However, if the C11 standard is talking about Pointer Value for referenced type rather than Pointer Object. The alignment check of the above code is meaningless. Express this idea in a program:

#include <stdio.h>

int main() {
    char buf[4096];

    char *pc = buf;
    
    /*
     * undefined behavior, because:
     * align of char is 1
     * align of int is 4
     * 
     * and we don't know whether the `value` of pc is 4-aligned.
     */
    int *pi = (int *) pc;
    printf("pi: %p\n", pi);
}

Which interpretation is correct?

like image 607
Richard Bryant Avatar asked Dec 10 '20 18:12

Richard Bryant


2 Answers

Interpretation B is correct. The standard is talking about a pointer to an object, not the object itself. "Resulting pointer" is referring to the result of the cast, and a cast does not produce an lvalue, so it's referring to the pointer value after the cast.

Taking the code in your example, suppose that an int must be aligned on a 4 byte boundary, i.e. it's address must be a multiple of 4. If the address of buf is 0x1001 then converting that address to int * is invalid because the pointer value is not properly aligned. If the address of buf is 0x1000 then converting it to int * is valid.

Update:

The code you added addresses the alignment issue, so it's fine in that regard. It however has a different issue: it violates strict aliasing.

The array you defined contains objects of type char. By casting the address to a different type and subsequently dereferencing the converted type type, you're accessing objects of one type as objects of another type. This is not allowed by the C standard.

Though the term "strict aliasing" is not used in the standard, the concept is described in section 6.5 paragraphs 6 and 7:

6 The effective type of an object for an access to its stored value is the declared type of the object, if any.87) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

...

87 ) Allocated objects have no declared type.

88 ) The intent of this list is to specify those circumstances in which an object may or may not be aliased.

In your example, you're writing an unsigned long and a double on top of char objects. Neither of these types satisfies the conditions of paragraph 7.

In addition to that, the pointer arithmetic here is not valid:

 *(((double *) buf) + 2) = 1.618;

As you're treating buf as an array of double when it is not. At the very least, you would need to perform the necessary arithmetic on buf directly and cast the result at the end.

So why is this a problem for a char array and not a buffer returned by malloc? Because memory returned from malloc has no effective type until you store something in it, which is what paragraph 6 and footnote 87 describe.

So from a strict point of view of the standard, what you're doing is undefined behavior. But depending on your compiler you may be able to disable strict aliasing so this will work. If you're using gcc, you'll want to pass the -fno-strict-aliasing flag

like image 171
dbush Avatar answered Oct 28 '22 15:10

dbush


The Standard does not require that implementations consider the possibility that code will ever observe a value in a T* that is not aligned for type T. In clang, for example, when targeting platforms whose "larger" load/store instructions do not support unaligned access, converting a pointer into a type whose alignment it doesn't satisfy and then using memcpy on it may result in the compiler generating code which will fail if the pointer isn't aligned, even though memcpy itself would not otherwise impose any alignment requirements.

When targeting an ARM Cortex-M0 or Cortex-M3, for example, given:

void test1(long long *dest, long long *src)
{
    memcpy(dest, src, sizeof (long long));
}
void test2(char *dest, char *src)
{
    memcpy(dest, src, sizeof (long long));
}
void test3(long long *dest, long long *src)
{
    *dest = *src;
}

clang will generate for both test1 and test3 code which would fail if src or dest were not aligned, but for test2 it will generate code which is bigger and slower, but which will support arbitrary alignment of the source and destination operands.

To be sure, even on clang the act of converting an unaligned pointer into a long long* won't generally cause anything weird to happen by itself, but it is the fact that such a conversion would produce UB that exempts the compiler of any responsibility to handle the unaligned-pointer case in test1.

like image 42
supercat Avatar answered Oct 28 '22 15:10

supercat