Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

casting then dereferencing pointers in C

Tags:

c

pointers

When working with char buffers in C, sometimes it would be useful and more efficient to able to work with int-sized chunks of data at a time. To do this I can cast my char * to an int * and use that pointer instead. However I'm not entirely confident that this works the way I think it does.

For example, suppose I have char *data, does *(int32_t *)data = -1 always overwrite the bytes data[0], data[1], data[2] and data[3] and no other bytes?

like image 780
Shum Avatar asked Jan 17 '23 15:01

Shum


2 Answers

Expanding on my comment.

There are two major issues here:

  • It violates strict-aliasing.
  • You might break alignment.

Violating strict-aliasing is technically undefined behavior. You are allowed to alias any datatype with char*, but not the other way around.

You can get around the issue with f[no-]strict-aliasing on GCC.


The other issue is alignment. The char pointer might not be properly aligned. Accessing misaligned data may lead to performance degradation or even a hardware exception if the hardware doesn't support misaligned access.


If performance isn't an issue, the full-proof way is to memcpy() to an int array buffer.

Once these two issues are resolved, your example with:

*(int32_t *)data = -1

overwriting data[0], data[1], data[2], and data[3] should work as expected if sizeof(int32_t) == 4. Just pay attention to the endianness...

like image 127
Mysticial Avatar answered Jan 25 '23 15:01

Mysticial


This is technically undefined behavior and the standard is silent on the results of aliasing pointers like this. A standards pedant would say that invoking undefined behavior in this way could result in anything from corrupted data to a system crash to Ragnarok.

Pragmatically, this depends on your hardware. Most modern systems (eg x86, x64, PPC, MIPS, ARM) handle word-sized writes in the way you describe, with the exception that writing to an unaligned address will result in a crash. Also, this is when endianness comes into play; on a little endian system

char foo[4];
*((uint_32*)(foo)) = 0x01020304;
// the following are now true:
foo[0] == 0x04;
foo[1] == 0x03;
foo[2] == 0x02;
foo[3] == 0x01;

The short answer is that this isn't safe unless you know exactly what hardware your program will run on.

If you do control the hardware you compile for, then you can predict what the compiler will do; I've used this trick to speed up packing of byte arrays on embedded systems.

like image 41
Crashworks Avatar answered Jan 25 '23 17:01

Crashworks