Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is reading one byte at a time endianness agnostic regardless of value size?

Tags:

c

endianness

Say I am reading and writing uint32_t values to and from a stream. If I read/write one byte at a time to/from a stream and shift each byte like the below examples, will the results be consistent regardless of machine endianness?

In the examples here the stream is a buffer in memory called p.

static uint32_t s_read_uint32(uint8_t** p)
{
    uint32_t value;
    value  = (*p)[0];
    value |= (((uint32_t)((*p)[1])) << 8);
    value |= (((uint32_t)((*p)[2])) << 16);
    value |= (((uint32_t)((*p)[3])) << 24);
    *p += 4;
    return value;
}

static void s_write_uint32(uint8_t** p, uint32_t value)
{
    (*p)[0] = value & 0xFF;
    (*p)[1] = (value >> 8 ) & 0xFF;
    (*p)[2] = (value >> 16) & 0xFF;
    (*p)[3] = value >> 24;
    *p += 4;
}

I don't currently have access to a big-endian machine to test this out, but the idea is if each byte is written one at a time each individual byte can be independently written or read from the stream. Then the CPU can handle endianness by hiding these details behind the shifting operations. Is this true, and if not could anyone please explain why not?

like image 968
Cecil Avatar asked May 30 '19 20:05

Cecil


People also ask

Does endianness matter for a single byte?

Again, endian-ness does not matter if you have a single byte. If you have one byte, it's the only data you read so there's only one way to interpret it (again, because computers agree on what a byte is). Now suppose we have our 4 bytes (W X Y Z) stored the same way on a big-and little-endian machine.

How do you read bytes in Little Endian?

In the case of little endian format, the least significant byte appears first, followed by the most significant byte. The letter 'T' has a value of 0x54 and is represented in 16 bit little endian as 54 00.

What determines the endianness?

Broadly speaking, the endianness in use is determined by the CPU. Because there are a number of options, it is unsurprising that different semiconductor vendors have chosen different endianness for their CPUs.

Do bytes have endianness?

When a value larger than byte is stored or serialized into multiple bytes, the choice of the order in which the component bytes are stored is called byte order, or endian, or endianness. Historically, there have been three byte orders in use: "big-endian", "little-endian", and "PDP-endian" or "middle-endian".


2 Answers

If I read/write one byte at a time to/from a stream and shift each byte like the below examples, will the results be consistent regardless of machine endianness?

Yes. Your s_write_uint32() function stores the bytes of the input value in order from least significant to most significant, regardless of their order in the native representation of that value. Your s_read_uint32() correctly reverses this process, regardless of the underlying representation of uint32_t. These work because

  • the behavior of the shift operators (<<, >>) is defined in terms of the value of the left operand, not its representation
  • the & 0xff masks off all bits of the left operand but those of its least-significant byte, regardless of the value's representation (because 0xff has a matching representation), and
  • the |= operations just put the bytes into the result; the positions are selected, appropriately, by the preceding left shift. This might be more clear if += were used instead, but the result would be no different.

Note, however, that to some extent, you are reinventing the wheel. POSIX defines a function pair htonl() and nothl() -- supported also on many non-POSIX systems -- for dealing with byte-order issues in four-byte numbers. The idea is that when sending, everyone uses htonl() to convert from host byte order (whatever that is) to network byte order (big endian) and sends the resulting four-byte buffer. On receipt, everyone accepts four bytes into one number, then uses ntohl() to convert from network to host byte order.

like image 196
John Bollinger Avatar answered Nov 11 '22 07:11

John Bollinger


It'll work but a memcpy followed by a conditional byteswap will give you much better codegen for the write function.

#include <stdint.h>
#include <string.h>

#define LE (((char*)&(uint_least32_t){1})[0]) // little endian ? 
void byteswap(char*,size_t);

uint32_t s2_read_uint32(uint8_t** p)
{
    uint32_t value;
    memcpy(&value,*p,sizeof(value));
    if(!LE) byteswap(&value,4);
    return *p+=4, value;
}

 void s2_write_uint32(uint8_t** p, uint32_t value)
{
    memcpy(*p,&value,sizeof(value));
    if(!LE) byteswap(*p,4);
    *p+=4;
}

Gcc since the 8th series (but not clang) can eliminate this shifts on a little-endian platforms, but you should help it by restrict-qualifying the doubly-indirect pointer to the destination, or else it might think that a write to (*p)[0] can invalidate *p (uint8_t is a char type and therefore permitted to alias anything).

void s_write_uint32(uint8_t** restrict p, uint32_t value)
{
    (*p)[0] = value & 0xFF;
    (*p)[1] = (value >> 8 ) & 0xFF;
    (*p)[2] = (value >> 16) & 0xFF;
    (*p)[3] = value >> 24;
    *p += 4;
}
like image 44
PSkocik Avatar answered Nov 11 '22 07:11

PSkocik