Update 2020-12-11: Thanks @"Some programmer dude" for the suggestion in the comment.
My underlying problem is that our team is implementing a dynamic type storage engine. We allocate multiple char array[PAGE_SIZE] buffers with 16-aligned to store dynamic types of data (there is no fixed struct). For efficiency reasons, we cannot perform byte encoding or allocate additional space to use memcpy
.
Since the alignment has been determined (i.e., 16), the rest is to use the cast of pointer to access objects of the specified type, for example:
int main() {
// simulate our 16-aligned malloc
_Alignas(16) char buf[4096];
// store some dynamic data:
*((unsigned long *) buf) = 0xff07;
*(((double *) buf) + 2) = 1.618;
}
But our team disputes whether this operation is undefined behavior.
I have read many similar questions, such as
But these are different from my interpretation of the C standard, I want to know if it’s my misunderstanding.
The main confusion is about the section 6.3.2.3 #7 of C11:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned 68) for the referenced type, the behavior is undefined.
68) In general, the concept ‘‘correctly aligned’’ is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.
Does the resulting pointer here refer to Pointer Object or Pointer Value?
In my opinion, I think the answer is the Pointer Object, but more answers seem to indicate the Pointer Value.
My thoughts are as follows: A pointer itself is an object. According to 6.2.5 #28, different pointer may have different representation and alignment requirements. Therefore, according to 6.3.2.3 #7, as long as two pointers have the same alignment, they can be safely converted without undefined behavior, but there is no guarantee that they can be dereferenced. Express this idea in a program:
#include <stdio.h>
int main() {
char buf[4096];
char *pc = buf;
if (_Alignof(char *) == _Alignof(int *)) {
// cast safely, because they have the same alignment requirement?
int *pi = (int *) pc;
printf("pi: %p\n", pi);
} else {
printf("char * and int * don't have the same alignment.\n");
}
}
However, if the C11 standard is talking about Pointer Value for referenced type rather than Pointer Object. The alignment check of the above code is meaningless. Express this idea in a program:
#include <stdio.h>
int main() {
char buf[4096];
char *pc = buf;
/*
* undefined behavior, because:
* align of char is 1
* align of int is 4
*
* and we don't know whether the `value` of pc is 4-aligned.
*/
int *pi = (int *) pc;
printf("pi: %p\n", pi);
}
Which interpretation is correct?
Interpretation B is correct. The standard is talking about a pointer to an object, not the object itself. "Resulting pointer" is referring to the result of the cast, and a cast does not produce an lvalue, so it's referring to the pointer value after the cast.
Taking the code in your example, suppose that an int
must be aligned on a 4 byte boundary, i.e. it's address must be a multiple of 4. If the address of buf
is 0x1001
then converting that address to int *
is invalid because the pointer value is not properly aligned. If the address of buf
is 0x1000
then converting it to int *
is valid.
Update:
The code you added addresses the alignment issue, so it's fine in that regard. It however has a different issue: it violates strict aliasing.
The array you defined contains objects of type char
. By casting the address to a different type and subsequently dereferencing the converted type type, you're accessing objects of one type as objects of another type. This is not allowed by the C standard.
Though the term "strict aliasing" is not used in the standard, the concept is described in section 6.5 paragraphs 6 and 7:
6 The effective type of an object for an access to its stored value is the declared type of the object, if any.87) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using
memcpy
ormemmove
, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type.
...
87 ) Allocated objects have no declared type.
88 ) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
In your example, you're writing an unsigned long
and a double
on top of char
objects. Neither of these types satisfies the conditions of paragraph 7.
In addition to that, the pointer arithmetic here is not valid:
*(((double *) buf) + 2) = 1.618;
As you're treating buf
as an array of double
when it is not. At the very least, you would need to perform the necessary arithmetic on buf
directly and cast the result at the end.
So why is this a problem for a char
array and not a buffer returned by malloc
? Because memory returned from malloc
has no effective type until you store something in it, which is what paragraph 6 and footnote 87 describe.
So from a strict point of view of the standard, what you're doing is undefined behavior. But depending on your compiler you may be able to disable strict aliasing so this will work. If you're using gcc, you'll want to pass the -fno-strict-aliasing
flag
The Standard does not require that implementations consider the possibility that code will ever observe a value in a T*
that is not aligned for type T. In clang, for example, when targeting platforms whose "larger" load/store instructions do not support unaligned access, converting a pointer into a type whose alignment it doesn't satisfy and then using memcpy
on it may result in the compiler generating code which will fail if the pointer isn't aligned, even though memcpy
itself would not otherwise impose any alignment requirements.
When targeting an ARM Cortex-M0 or Cortex-M3, for example, given:
void test1(long long *dest, long long *src)
{
memcpy(dest, src, sizeof (long long));
}
void test2(char *dest, char *src)
{
memcpy(dest, src, sizeof (long long));
}
void test3(long long *dest, long long *src)
{
*dest = *src;
}
clang will generate for both test1 and test3 code which would fail if src
or dest
were not aligned, but for test2
it will generate code which is bigger and slower, but which will support arbitrary alignment of the source and destination operands.
To be sure, even on clang the act of converting an unaligned pointer into a long long*
won't generally cause anything weird to happen by itself, but it is the fact that such a conversion would produce UB that exempts the compiler of any responsibility to handle the unaligned-pointer case in test1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With