Say I am reading and writing uint32_t
values to and from a stream. If I read/write one byte at a time to/from a stream and shift each byte like the below examples, will the results be consistent regardless of machine endianness?
In the examples here the stream is a buffer in memory called p
.
static uint32_t s_read_uint32(uint8_t** p)
{
uint32_t value;
value = (*p)[0];
value |= (((uint32_t)((*p)[1])) << 8);
value |= (((uint32_t)((*p)[2])) << 16);
value |= (((uint32_t)((*p)[3])) << 24);
*p += 4;
return value;
}
static void s_write_uint32(uint8_t** p, uint32_t value)
{
(*p)[0] = value & 0xFF;
(*p)[1] = (value >> 8 ) & 0xFF;
(*p)[2] = (value >> 16) & 0xFF;
(*p)[3] = value >> 24;
*p += 4;
}
I don't currently have access to a big-endian machine to test this out, but the idea is if each byte is written one at a time each individual byte can be independently written or read from the stream. Then the CPU can handle endianness by hiding these details behind the shifting operations. Is this true, and if not could anyone please explain why not?
Again, endian-ness does not matter if you have a single byte. If you have one byte, it's the only data you read so there's only one way to interpret it (again, because computers agree on what a byte is). Now suppose we have our 4 bytes (W X Y Z) stored the same way on a big-and little-endian machine.
In the case of little endian format, the least significant byte appears first, followed by the most significant byte. The letter 'T' has a value of 0x54 and is represented in 16 bit little endian as 54 00.
Broadly speaking, the endianness in use is determined by the CPU. Because there are a number of options, it is unsurprising that different semiconductor vendors have chosen different endianness for their CPUs.
When a value larger than byte is stored or serialized into multiple bytes, the choice of the order in which the component bytes are stored is called byte order, or endian, or endianness. Historically, there have been three byte orders in use: "big-endian", "little-endian", and "PDP-endian" or "middle-endian".
If I read/write one byte at a time to/from a stream and shift each byte like the below examples, will the results be consistent regardless of machine endianness?
Yes. Your s_write_uint32()
function stores the bytes of the input value in order from least significant to most significant, regardless of their order in the native representation of that value. Your s_read_uint32()
correctly reverses this process, regardless of the underlying representation of uint32_t
. These work because
<<
, >>
) is defined in terms of the value of the left operand, not its representation& 0xff
masks off all bits of the left operand but those of its least-significant byte, regardless of the value's representation (because 0xff
has a matching representation), and|=
operations just put the bytes into the result; the positions are selected, appropriately, by the preceding left shift. This might be more clear if +=
were used instead, but the result would be no different.Note, however, that to some extent, you are reinventing the wheel. POSIX defines a function pair htonl()
and nothl()
-- supported also on many non-POSIX systems -- for dealing with byte-order issues in four-byte numbers. The idea is that when sending, everyone uses htonl()
to convert from host byte order (whatever that is) to network byte order (big endian) and sends the resulting four-byte buffer. On receipt, everyone accepts four bytes into one number, then uses ntohl()
to convert from network to host byte order.
It'll work but a memcpy
followed by a conditional byteswap will give you much better codegen for the write function.
#include <stdint.h>
#include <string.h>
#define LE (((char*)&(uint_least32_t){1})[0]) // little endian ?
void byteswap(char*,size_t);
uint32_t s2_read_uint32(uint8_t** p)
{
uint32_t value;
memcpy(&value,*p,sizeof(value));
if(!LE) byteswap(&value,4);
return *p+=4, value;
}
void s2_write_uint32(uint8_t** p, uint32_t value)
{
memcpy(*p,&value,sizeof(value));
if(!LE) byteswap(*p,4);
*p+=4;
}
Gcc since the 8th series (but not clang) can eliminate this shifts on a little-endian platforms, but you should help it by restrict
-qualifying the doubly-indirect pointer to the destination, or else it might think that a write to (*p)[0]
can invalidate *p
(uint8_t
is a char type and therefore permitted to alias anything).
void s_write_uint32(uint8_t** restrict p, uint32_t value)
{
(*p)[0] = value & 0xFF;
(*p)[1] = (value >> 8 ) & 0xFF;
(*p)[2] = (value >> 16) & 0xFF;
(*p)[3] = value >> 24;
*p += 4;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With