Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What makes a system little-endian or big-endian?

Tags:

c++

endianness

I'm confused with the byte order of a system/cpu/program.
So I must ask some questions to make my mind clear.

Question 1

If I only use type char in my C++ program:

void main()
{
    char c = 'A';
    char* s = "XYZ";    
}

Then compile this program to a executable binary file called a.out.
Can a.out both run on little-endian and big-endian systems?

Question 2

If my Windows XP system is little-endian, can I install a big-endian Linux system in VMWare/VirtualBox? What makes a system little-endian or big-endian?

Question 3

If I want to write a byte-order-independent C++ program, what do I need to take into account?

like image 371
kev Avatar asked Feb 11 '12 02:02

kev


People also ask

How do you know if its little endian or big-endian?

If it is little-endian, it would be stored as “01 00 00 00”. The program checks the first byte by dereferencing the cptr pointer. If it equals to 0, it means the processor is big-endian(“00 00 00 01”), If it equals to 1, it means the processor is little-endian (“01 00 00 00”).

What determines endianness?

Broadly speaking, the endianness in use is determined by the CPU. Because there are a number of options, it is unsurprising that different semiconductor vendors have chosen different endianness for their CPUs.

What is little endian and big-endian with example?

Little and big endian are two ways of storing multibyte data-types ( int, float, etc). In little endian machines, last byte of binary representation of the multibyte data-type is stored first. On the other hand, in big endian machines, first byte of binary representation of the multibyte data-type is stored first.

What systems use little endian?

The following platforms are considered little endian: VAX/VMS, AXP/VMS, Digital UNIX, Intel ABI, OS/2, and Windows. On big endian platforms, the value 1 is stored in binary and is represented here in hexadecimal notation.


2 Answers

Can a.out both run on little-endian and big-endian system?

No, because pretty much any two CPUs that are so different as to have different endian-ness will not run the same instruction set. C++ isn't Java; you don't compile to something that gets compiled or interpreted. You compile to the assembly for a specific CPU. And endian-ness is part of the CPU.

But that's outside of endian issues. You can compile that program for different CPUs and those executables will work fine on their respective CPUs.

What makes a system little-endian or big-endian?

As far as C or C++ is concerned, the CPU. Different processing units in a computer can actually have different endians (the GPU could be big-endian while the CPU is little endian), but that's somewhat uncommon.

If I want to write a byte-order independent C++ program, what do I need to take into account?

As long as you play by the rules of C or C++, you don't have to care about endian issues.

Of course, you also won't be able to load files directly into POD structs. Or read a series of bytes, pretend it is a series of unsigned shorts, and then process it as a UTF-16-encoded string. All of those things step into the realm of implementation-defined behavior.

There's a difference between "undefined" and "implementation-defined" behavior. When the C and C++ spec say something is "undefined", it basically means all manner of brokenness can ensue. If you keep doing it, (and your program doesn't crash) you could get inconsistent results. When it says that something is defined by the implementation, you will get consistent results for that implementation.

If you compile for x86 in VC2010, what happens when you pretend a byte array is an unsigned short array (ie: unsigned char *byteArray = ...; unsigned short *usArray = (unsigned short*)byteArray) is defined by the implementation. When compiling for big-endian CPUs, you'll get a different answer than when compiling for little-endian CPUs.

In general, endian issues are things you can localize to input/output systems. Networking, file reading, etc. They should be taken care of in the extremities of your codebase.

like image 153
Nicol Bolas Avatar answered Oct 05 '22 14:10

Nicol Bolas


Question 1:

Can a.out both run on little-endian and big-endian system?

No. Because a.out is already compiled for whatever architecture it is targeting. It will not run on another architecture that it is incompatible with.

However, the source code for that simple program has nothing that could possibly break on different endian machines.

So yes it (the source) will work properly. (well... aside from void main(), which you should be using int main() instead)

Question 2:

If my WindowsXP system is little-endian, can I install a big-endian Linux system in VMWare/VirtualBox?

Endian-ness is determined by the hardware, not the OS. So whatever (native) VM you install on it, will be the same endian as the host. (since x86 is all little-endian)

What makes a system little-endian or big-endian?

Here's an example of something that will behave differently on little vs. big-endian:

uint64_t a = 0x0123456789abcdefull;
uint32_t b = *(uint32_t*)&a;
printf("b is %x",b)

*Note that this violates strict-aliasing, and is only for demonstration purposes.

Little Endian : b is 89abcdef
Big Endian    : b is 1234567

On little-endian, the lower bits of a are stored at the lowest address. So when you access a as a 32-bit integer, you will read the lower 32 bits of it. On big-endian, you will read the upper 32 bits.

Question 3:

If I want to write a byte-order independent C++ program, what do I need to take into account?

Just follow the standard C++ rules and don't do anything ugly like the example I've shown above. Avoid undefined behavior, avoid type-punning...

like image 36
Mysticial Avatar answered Oct 05 '22 15:10

Mysticial