In this answer, zwol made this claim:
The correct way to convert two bytes of data from an external source into a 16-bit signed integer is with helper functions like this:
#include <stdint.h> int16_t be16_to_cpu_signed(const uint8_t data[static 2]) { uint32_t val = (((uint32_t)data[0]) << 8) | (((uint32_t)data[1]) << 0); return ((int32_t) val) - 0x10000u; } int16_t le16_to_cpu_signed(const uint8_t data[static 2]) { uint32_t val = (((uint32_t)data[0]) << 0) | (((uint32_t)data[1]) << 8); return ((int32_t) val) - 0x10000u; }
Which of the above functions is appropriate depends on whether the array contains a little endian or a big endian representation. Endianness is not the issue at question here, I am wondering why zwol subtracts 0x10000u
from the uint32_t
value converted to int32_t
.
Why is this the correct way?
How does it avoid the implementation defined behavior when converting to the return type?
Since you can assume 2's complement representation, how would this simpler cast fail: return (uint16_t)val;
What is wrong with this naive solution:
int16_t le16_to_cpu_signed(const uint8_t data[static 2]) { return (uint16_t)data[0] | ((uint16_t)data[1] << 8); }
Signed Integer: A 16-bit signed integer ranging from -32,768 to +32,767.
A 16-bit integer can store 216 (or 65,536) distinct values. In an unsigned representation, these values are the integers between 0 and 65,535; using two's complement, possible values range from −32,768 to 32,767.
For an unsigned short, all 16 bits are used to represent the value, so the largest representable number is 216 − 1 = 65,535.
There are also 65,536 different signed 16-bit numbers. The smallest signed 16-bit number is -32768 and the largest is 32767.
The data storage d in byte (B) is equal to the data storage d in bit (b) times 0.125, that conversion formula: d (B) = d (b) × 0.125 How many Byte in a Bit?
You were nearly there. combining two bytes (uint8) does not make a 32-bit number (uint32), but a 16-bit numbers (uint16), so: You haven't specified the endianness of your data, nor that of your machine. If the two don't match, you'll have to swapbytes afterward: I don't know which element of x is the upper or lower byte, so I did it both ways.
computing the unsigned 16 bit value with (unsigned)data | ((unsigned)data << 8) (for the little endian version) compiles to a single instruction and yields an unsigned 16-bit value.
If int is 16-bit then your version relies on implementation-defined behaviour if the value of the expression in the return statement is out of range for int16_t.
If int
is 16-bit then your version relies on implementation-defined behaviour if the value of the expression in the return
statement is out of range for int16_t
.
However the first version also has a similar problem; for example if int32_t
is a typedef for int
, and the input bytes are both 0xFF
, then the result of the subtraction in the return statement is UINT_MAX
which causes implementation-defined behaviour when converted to int16_t
.
IMHO the answer you link to has several major issues .
This should be pedantically correct and work also on platforms that use sign bit or 1's complement representations, instead of the usual 2's complement. The input bytes are assumed to be in 2's complement.
int le16_to_cpu_signed(const uint8_t data[static 2]) { unsigned value = data[0] | ((unsigned)data[1] << 8); if (value & 0x8000) return -(int)(~value) - 1; else return value; }
Because of the branch, it will be more expensive than other options.
What this accomplishes is that it avoids any assumption on how int
representation relates to unsigned
representation on the platform. The cast to int
is required to preserve arithmetic value for any number that will fit in target type. Because the inversion ensures top bit of 16-bit number will be zero, the value will fit. Then the unary -
and subtraction of 1 apply the usual rule for 2's complement negation. Depending on platform, INT16_MIN
could still overflow if it doesn't fit in the int
type on the target, in which case long
should be used.
The difference to the original version in the question comes at the return time. While the original just always subtracted 0x10000
and 2's complement let signed overflow wrap it to int16_t
range, this version has the explicit if
that avoids signed wrapover (which is undefined).
Now in practice, almost all platforms in use today use 2's complement representation. In fact, if the platform has standard-compliant stdint.h
that defines int32_t
, it must use 2's complement for it. Where this approach sometimes comes handy is with some scripting languages that don't have integer data types at all - you can modify the operations shown above for floats and it will give the correct result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With