What's the best way to convert an "uint8_t" to an "sint8_t" in portable C.
That's the code I came up with ....
#include <stdint.h>
sint8_t DESER_SINT8(uint8_t x)
(
return
(sint8_t)((x >= (1u << 8u))
? -(UINT8_MAX - x)
: x);
)
Is there a better/simpler way to do it? Maybe a way without using a conditional?
Edit: Thanks guys. So to sum up, what I learned already ...
sint8_t
is really called int8_t
128
is expressed by 1 << 7
and not by 1 << 8
:)
So here is an updated version of my original code:
#include <stdint.h>
int8_t DESER_INT8(uint8_t x)
(
return ((x >= (1 << 7))
? -(UINT8_MAX - x + 1)
: x);
)
1u << 8u
is 0x100u
, which is larger than every uint8_t
value, so the conditional is never satisfied. Your "conversion" routine is actually just:
return x;
which actually makes some sense.
You need to define more clearly what you want for conversion. C99 defines conversion from unsigned to signed integer types as follows (§6.3.1.3 "Signed and unsigned integers")
When a value with integer type is converted to another integer type other than
_Bool
, if the value can be represented by the new type, it is unchanged.…
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Thus, uint8_t
values between 0
and 127
are preserved, and the behavior for values larger than 127
is undefined. Many (but not all) implementations will simply interpret the unsigned values as a twos-complement representation of a signed integer. Perhaps what you're really asking is how to guarantee this behavior across platforms?
If so, you can use:
return x < 128 ? x : x - 256;
The value x - 256
is an int
, guaranteed to have the value of x
interpreted as a twos-complement 8 bit integer. The implicit conversion to int8_t
then preserves this value.
This all assumes that sint8_t
is meant to be int8_t
, as sint8_t
isn't a standard type. If it isn't, then all bets are off, because the correctness of the conversion I suggested depends on the guarantee that int8_t
have a twos-complement representation (§7.18.1.1 "Exact-width integer types").
If sint8_t
is instead some wacky platform-specific type, it might use some other representation like ones-complement, which has a different set of representable values, thus rendering the conversion described above implementation-defined (hence non-portable) for certain inputs.
EDIT
Alf has argued that this is "silly", and that this will never be necessary on any production system. I disagree, but it is admittedly a corner case of a corner case. His argument is not entirely without merit.
His claim that this is "inefficient" and should therefore be avoided, however, is baseless. A reasonable optimizing compiler will optimize this away on platforms where it is unnecessary. Using GCC on x86_64 for example:
#include <stdint.h>
int8_t alf(uint8_t x) {
return x;
}
int8_t steve(uint8_t x) {
return x < 128 ? x : x - 256;
}
int8_t david(uint8_t x) {
return (x ^ 0x80) - 0x80;
}
compiled with -Os -fomit-frame-pointer yields the following:
_alf:
0000000000000000 movsbl %dil,%eax
0000000000000004 ret
_steve:
0000000000000005 movsbl %dil,%eax
0000000000000009 ret
_david:
000000000000000a movsbl %dil,%eax
000000000000000e ret
Note that all three implementations are identical after optimization. Clang/LLVM gives exactly the same result. Similarly, if we build for ARM instead of x86:
_alf:
00000000 b240 sxtb r0, r0
00000002 4770 bx lr
_steve:
00000004 b240 sxtb r0, r0
00000006 4770 bx lr
_david:
00000008 b240 sxtb r0, r0
0000000a 4770 bx lr
Protecting your implementation against corner cases when it has no cost for the "usual" case is never "silly".
To the argument that this adds needless complexity, I say: which is harder -- writing a comment to explain the conversion and why it is there, or your successor's intern trying to debug the problem 10 years from now when a new compiler breaks the lucky happenstance that you've been silently depending on all this time? Is the following really so hard to maintain?
// The C99 standard does not guarantee the behavior of conversion
// from uint8_t to int8_t when the value to be converted is larger
// than 127. This function implements a conversion that is
// guaranteed to wrap as though the unsigned value were simply
// reinterpreted as a twos-complement value. With most compilers
// on most systems, it will be optimized away entirely.
int8_t safeConvert(uint8_t x) {
return x < 128 ? x : x - 256;
}
When all is said and done, I agree that this is vaguely over the top, but I also think we should try to answer the question at face value. A better solution, of course, would be for the C standard to pin down the behavior of conversions from unsigned to signed when the signed type is a twos-complement integer without padding (thus specifying the behavior for all of the intN_t
types).
Conversion of uint8_t
to int8_t
essentially reverses the order of the two half-ranges. "High" numbers become "low." This can be accomplished with XOR.
x ^ 0x80
However, all the numbers are still positive. That's no good. We need to introduce the proper sign and restore the proper magnitude.
return ( x ^ 0x80 ) - 0x80;
There you go!
I don't know if this has any practical value, but here's a different approach that came to mind:
uint8_t input;
int8_t output;
*(uint8_t *)&output = input;
Note that:
int8_t
is required to be twos complement.The only way I can see that this reasoning might fail to be valid is if CHAR_BIT>8
and the 8-bit integer types are extended integer types with trap bits that somehow flag whether the value is signed or unsigned. However, the following analogous code using char
types explicitly could never fail:
unsigned char input;
signed char output;
*(unsigned char *)output = input;
because char
types cannot have padding/trap bits.
A potential variant would be:
return ((union { uint8_t u; int8_t s; }){ input }).s;
or for char
types:
return ((union { unsigned char u; signed char s; }){ input }).s;
Edit: As Steve Jessop pointed out in another answer, int8_t
and uint8_t
are required not to have padding bits if they exist, so their existence implies CHAR_BIT==8
. So I'm confident that this approach is valid. With that said, I would still never use uint8_t
and always explicitly use unsigned char
, in case the implementation implements uint8_t
as an equal-size extended integer type, because char
types have special privileges with respect to aliasing rules and type punning which make them more desirable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With