I was experimenting with pointer manipulation and decided to try converting an array of numbers into an integer by directly copying from memory using memcpy. <pre class="prettyprint"><code>char aux[4] = {1,2,3,4}; int aux2 = 0; memcpy((char*) &aux2, &aux[0], 4); printf("%X", aux2); </code></pre> I expected the result to be 0x1020304 since I'm copying the exact bytes from one to another, but printf gives me the result 0x4030201, which is almost my desired output, only backwards. Why does this happen and is there a way to get the result in the "correct" order?

Your code has at best implementation defined behavior and in some cases undefined behavior. Type <code>int</code> may have a size different from <code>4</code>: on 16-bit systems, <code>int</code> typically has a size of only <code>2</code> bytes. You would have undefined behavior on such systems. On regular 32-bit systems, <code>int</code> has 4 bytes, but the order in which the 4 bytes are stored in memory is implementation defined, a problem referred to as endianness: <ul> <li> some systems use big-endian representation, where the first byte is the most significant part of the integer. Bytes <code>01 02 03 04</code> represent the value <code>0x01020304</code> on big-endian systems, such as older Macs, some mobile phones and embedded systems. </li> <li> conversely, most personal computers today use little-endian representation, where the first byte contains the least significant part of the integer. Bytes <code>01 02 03 04</code> represent the value <code>0x04030201</code> on little-endian systems, such as yours. </li> <li> The C Standard does not exclude other representations, where bytes would be in some other order. This was the case on some ancient DEC systems: the PDP-11, where the C language was originally developped (middle-endian or mixed-endian). </li> </ul> Albeit surprising, the little-endian order is very logical as the byte at offset n contains the bits representing values between 2n*8 and 2n*8+7. Endianness is a cultural issue, both choices seem natural to long time users. The same variations are found in other contexts, such as the ordering of date components: <ul> <li> Japan uses big-endian representation: February 17 2021 is written <code>2021.02.17</code>, </li> <li> Europe uses little-endian representation: February 17 2021 is written <code>17/02/2021</code>, </li> <li> The USA use a middle-endian representation: February 17 2021 is written <code>02/17/2021</code>. </li> <li> 21 is pronounced twenty-one in English (big-endian) whereas Germans say einundzwanzig (one and twenty, little endian and actually middle-endian for 3-digit numbers). But then 17 is seventeen (little-endian) and in French dix-sept (big-endian). </li> <li> Western languages write numbers in big-endian format (I am 42 years old) but semitic scripts use little-endian order: Hebrew (אני בת 42) and Arabic (أنا ٤٢ سنة) both use little-endian as they are read from right to left. </li> </ul> Here is a more portable version to test memory representation: <pre class="prettyprint"><code>#include <stdio.h> #include <string.h> int main() { unsigned int aux2 = 0x01020304; unsigned char aux[sizeof(unsigned int)]; memcpy(&aux, aux2, sizeof(aux)); printf("%X is represented in memory as", aux2); for (size_t i = 0; i < sizeof(aux); i++) printf(" %02X", aux[i]); printf("\n"); return 0; } </code></pre>

using memcpy to convert from array to int

Tags:

c

pointers

memory

byte

memcpy

I was experimenting with pointer manipulation and decided to try converting an array of numbers into an integer by directly copying from memory using memcpy.

char aux[4] = {1,2,3,4}; 
int aux2 = 0;
memcpy((char*) &aux2, &aux[0], 4);
printf("%X", aux2);

I expected the result to be 0x1020304 since I'm copying the exact bytes from one to another, but printf gives me the result 0x4030201, which is almost my desired output, only backwards. Why does this happen and is there a way to get the result in the "correct" order?

768

asked Feb 09 '21 18:02

Daniel Peruchi Negris

1 Answers

Your code has at best implementation defined behavior and in some cases undefined behavior.

Type int may have a size different from 4: on 16-bit systems, int typically has a size of only 2 bytes. You would have undefined behavior on such systems.

On regular 32-bit systems, int has 4 bytes, but the order in which the 4 bytes are stored in memory is implementation defined, a problem referred to as endianness:

some systems use big-endian representation, where the first byte is the most significant part of the integer. Bytes 01 02 03 04 represent the value 0x01020304 on big-endian systems, such as older Macs, some mobile phones and embedded systems.
conversely, most personal computers today use little-endian representation, where the first byte contains the least significant part of the integer. Bytes 01 02 03 04 represent the value 0x04030201 on little-endian systems, such as yours.
The C Standard does not exclude other representations, where bytes would be in some other order. This was the case on some ancient DEC systems: the PDP-11, where the C language was originally developped (middle-endian or mixed-endian).

Albeit surprising, the little-endian order is very logical as the byte at offset n contains the bits representing values between 2^n*8 and 2^n*8+7. Endianness is a cultural issue, both choices seem natural to long time users.

The same variations are found in other contexts, such as the ordering of date components:

Japan uses big-endian representation: February 17 2021 is written 2021.02.17,
Europe uses little-endian representation: February 17 2021 is written 17/02/2021,
The USA use a middle-endian representation: February 17 2021 is written 02/17/2021.
21 is pronounced twenty-one in English (big-endian) whereas Germans say einundzwanzig (one and twenty, little endian and actually middle-endian for 3-digit numbers). But then 17 is seventeen (little-endian) and in French dix-sept (big-endian).
Western languages write numbers in big-endian format (I am 42 years old) but semitic scripts use little-endian order: Hebrew (אני בת 42) and Arabic (أنا ٤٢ سنة) both use little-endian as they are read from right to left.

Here is a more portable version to test memory representation:

#include <stdio.h>
#include <string.h>

int main() {
    unsigned int aux2 = 0x01020304;
    unsigned char aux[sizeof(unsigned int)]; 
    memcpy(&aux, aux2, sizeof(aux));
    printf("%X is represented in memory as", aux2);
    for (size_t i = 0; i < sizeof(aux); i++)
        printf(" %02X", aux[i]);
    printf("\n");
    return 0;
}

answered Oct 17 '22 16:10

chqrlie

Related questions
                            
                                How do you check if a serial port is open in Linux?
                            
                                Fastest way to zero pages in Linux
                            
                                Where are returned values stored?
                            
                                How to assert two types are equal in c?
                            
                                how to prevent linker from discarding a function?
                            
                                DMB instructions in an interrupt-safe FIFO
                            
                                Preprocessor definition duplication
                            
                                Force a C compiler to produce integer narrowing warning
                            
                                `Cannot open include file: 'apr_perms_set.h'` when doing `pip install mod_wsgi`
                            
                                Question about GCC Optimizer and why this code always returns 42?
                            
                                Is it possible to test that an abort-routine doesn't return?
                            
                                Exclude a word if it is present in an array of words
                            
                                C standard regarding pointer arithmetic outside arrays
                            
                                How compile time initialization of variables works internally in c?
                            
                                How do I create a C project in visual Studio 2019?
                            
                                Is it worse in any aspect to use the CMPXCHG instruction on an 8-bit field than on a 32-bit field?
                            
                                extern "C" static array function parameter
                            
                                How can I get GCC to optimize this bit-shifting instruction into a move?
                            
                                can't use sscanf() in C for char array
                            
                                printf("%f",x) ok, printf("%F",x) error too many arguments for format

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With