How can I read a signed integer from a buffer of uint8_t without invoking un- or implementation-defined behaviour?

Question

Here's a simple function that tries to do read a generic twos-complement integer from a big-endian buffer, where we'll assume std::is_signed_v<INT_T>:

template<typename INT_T>
INT_T read_big_endian(uint8_t const *data) {
    INT_T result = 0;
    for (size_t i = 0; i < sizeof(INT_T); i++) {
        result <<= 8;
        result |= *data;
        data++;
    }
    return result;
}

Unfortunately, this is undefined behaviour, as the last <<= shifts into the sign bit.

So now we try the following:

template<typename INT_T>
INT_T read_big_endian(uint8_t const *data) {
    std::make_unsigned_t<INT_T> result = 0;
    for (size_t i = 0; i < sizeof(INT_T); i++) {
        result <<= 8;
        result |= *data;
        data++;
    }
    return static_cast<INT_T>(result);
}

But we're now invoking implementation-defined behaviour in the static_cast, converting from unsigned to signed.

How can I do this while staying in the "well-defined" realm?

supercat · Accepted Answer

Start by assembling bytes into an unsigned value. Unless you need to assemble groups of 9 or more octets, a conforming C99 implementation is guaranteed to have such a type that is large enough to hold them all (a C89 implementation would be guaranteed to have an unsigned type large enough to hold at least four).

In most cases, where you want to convert a sequence of octets to a number, you'll know how many octets you're expecting. If data is encoded as 4 bytes, you should use four bytes regardless of the sizes of int and long (a portable function should return type long).

unsigned long octets_to_unsigned32_little_endian(unsigned char *p)
{
  return p[0] | 
    ((unsigned)p[1]<<8) |
    ((unsigned long)p[2]<<16) |
    ((unsigned long)p[3]<<24);
}
long octets_to_signed32_little_endian(unsigned char *p)
{
  unsigned long as_unsigned = octets_to_unsigned32_little_endian(p);
  if (as_unsigned < 0x80000000)
    return as_unsigned;
  else
    return (long)(as_unsigned^0x80000000UL)-0x40000000L-0x40000000L;
}

Note that the subtraction is done as two parts, each within the range of a signed long, to allow for the possibility of systems where LNG_MIN is -2147483647. Attempting to convert byte sequence {0,0,0,0x80} on such a system may yield Undefined Behavior [since it would compute the value -2147483648] but the code should process in fully portable fashion all values which would be within the range of "long".

How can I read a signed integer from a buffer of uint8_t without invoking un- or implementation-defined behaviour?

Tags:

c++

undefined-behavior

implementation-defined-behavior

Eric

1 Answers

supercat

Recent Activity

Donate For Us

How can I read a signed integer from a buffer of uint8_t without invoking un- or implementation-defined behaviour?

Tags:

c++

undefined-behavior

implementation-defined-behavior

Eric

1 Answers

supercat

Related questions

Recent Activity

Donate For Us