I was experimenting with pointer manipulation and decided to try converting an array of numbers into an integer by directly copying from memory using memcpy.
char aux[4] = {1,2,3,4};
int aux2 = 0;
memcpy((char*) &aux2, &aux[0], 4);
printf("%X", aux2);
I expected the result to be 0x1020304 since I'm copying the exact bytes from one to another, but printf gives me the result 0x4030201, which is almost my desired output, only backwards. Why does this happen and is there a way to get the result in the "correct" order?
memmove() is similar to memcpy() as it also copies data from a source to destination.
memcpy is usually naive - certainly not the slowest way to copy memory around, but usually quite easy to beat with some loop unrolling, and you can go even further with assembler.
Memcpy copies data bytes by byte from the source array to the destination array. This copying of data is threadsafe.
The function memcpy() is used to copy a memory block from one location to another. One is source and another is destination pointed by the pointer. This is declared in “string.
memcpy() in C/C++. memcpy() is used to copy a block of memory from a location to another. It is declared in string.h. // Copies "numBytes" bytes from address "from" to address "to" void * memcpy(void *to, const void *from, size_t numBytes);
So instead of memcpy (temp, a, sizeof (a));, you would use COPY_ARRAY (temp, a, 1); Users just specify source, destination and the number of elements; the size of an element is inferred automatically. It checks if the multiplication of size and element count overflows.
The output may differ depending on the endianness of your computer's architecture. In this example, the GetBytes (Int32) method of the BitConverter class is called to convert an int to an array of bytes. The output may differ depending on the endianness of your computer's architecture.
COPY_ARRAY is safe to use with NULL as source pointer iff 0 elements are to be copied. That convention is used in some cases for initializing arrays. Raw memcpy (3) does not support it -- compilers are allowed to assume that only valid pointers are passed to it and can optimize away NULL checks after such a call.
Your code has at best implementation defined behavior and in some cases undefined behavior.
Type int
may have a size different from 4
: on 16-bit systems, int
typically has a size of only 2
bytes. You would have undefined behavior on such systems.
On regular 32-bit systems, int
has 4 bytes, but the order in which the 4 bytes are stored in memory is implementation defined, a problem referred to as endianness:
some systems use big-endian representation, where the first byte is the most significant part of the integer. Bytes 01 02 03 04
represent the value 0x01020304
on big-endian systems, such as older Macs, some mobile phones and embedded systems.
conversely, most personal computers today use little-endian representation, where the first byte contains the least significant part of the integer. Bytes 01 02 03 04
represent the value 0x04030201
on little-endian systems, such as yours.
The C Standard does not exclude other representations, where bytes would be in some other order. This was the case on some ancient DEC systems: the PDP-11, where the C language was originally developped (middle-endian or mixed-endian).
Albeit surprising, the little-endian order is very logical as the byte at offset n contains the bits representing values between 2n*8 and 2n*8+7. Endianness is a cultural issue, both choices seem natural to long time users.
The same variations are found in other contexts, such as the ordering of date components:
Japan uses big-endian representation: February 17 2021 is written 2021.02.17
,
Europe uses little-endian representation: February 17 2021 is written 17/02/2021
,
The USA use a middle-endian representation: February 17 2021 is written 02/17/2021
.
21 is pronounced twenty-one in English (big-endian) whereas Germans say einundzwanzig (one and twenty, little endian and actually middle-endian for 3-digit numbers). But then 17 is seventeen (little-endian) and in French dix-sept (big-endian).
Western languages write numbers in big-endian format (I am 42 years old) but semitic scripts use little-endian order: Hebrew (אני בת 42) and Arabic (أنا ٤٢ سنة) both use little-endian as they are read from right to left.
Here is a more portable version to test memory representation:
#include <stdio.h>
#include <string.h>
int main() {
unsigned int aux2 = 0x01020304;
unsigned char aux[sizeof(unsigned int)];
memcpy(&aux, aux2, sizeof(aux));
printf("%X is represented in memory as", aux2);
for (size_t i = 0; i < sizeof(aux); i++)
printf(" %02X", aux[i]);
printf("\n");
return 0;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With