Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

conversion of uint8_t to sint8_t

What's the best way to convert an "uint8_t" to an "sint8_t" in portable C.

That's the code I came up with ....

#include <stdint.h>

sint8_t DESER_SINT8(uint8_t x)
(
  return
     (sint8_t)((x >= (1u << 8u))
               ? -(UINT8_MAX - x)
               : x);
)

Is there a better/simpler way to do it? Maybe a way without using a conditional?

Edit: Thanks guys. So to sum up, what I learned already ...

  • sint8_t is really called int8_t
  • 128 is expressed by 1 << 7 and not by 1 << 8
  • 2s complement is "negating off by one"

:)

So here is an updated version of my original code:

#include <stdint.h>

int8_t DESER_INT8(uint8_t x)
(
  return ((x >= (1 << 7))
          ? -(UINT8_MAX - x + 1)
          : x);
)
like image 730
heckenpenner_rot Avatar asked Oct 08 '10 17:10

heckenpenner_rot


Video Answer


3 Answers

1u << 8u is 0x100u, which is larger than every uint8_t value, so the conditional is never satisfied. Your "conversion" routine is actually just:

return x;

which actually makes some sense.

You need to define more clearly what you want for conversion. C99 defines conversion from unsigned to signed integer types as follows (§6.3.1.3 "Signed and unsigned integers")

When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Thus, uint8_t values between 0 and 127 are preserved, and the behavior for values larger than 127 is undefined. Many (but not all) implementations will simply interpret the unsigned values as a twos-complement representation of a signed integer. Perhaps what you're really asking is how to guarantee this behavior across platforms?

If so, you can use:

return x < 128 ? x : x - 256;

The value x - 256 is an int, guaranteed to have the value of x interpreted as a twos-complement 8 bit integer. The implicit conversion to int8_t then preserves this value.

This all assumes that sint8_t is meant to be int8_t, as sint8_t isn't a standard type. If it isn't, then all bets are off, because the correctness of the conversion I suggested depends on the guarantee that int8_t have a twos-complement representation (§7.18.1.1 "Exact-width integer types").

If sint8_t is instead some wacky platform-specific type, it might use some other representation like ones-complement, which has a different set of representable values, thus rendering the conversion described above implementation-defined (hence non-portable) for certain inputs.


EDIT

Alf has argued that this is "silly", and that this will never be necessary on any production system. I disagree, but it is admittedly a corner case of a corner case. His argument is not entirely without merit.

His claim that this is "inefficient" and should therefore be avoided, however, is baseless. A reasonable optimizing compiler will optimize this away on platforms where it is unnecessary. Using GCC on x86_64 for example:

#include <stdint.h>

int8_t alf(uint8_t x) {
    return x;
}

int8_t steve(uint8_t x) {
    return x < 128 ? x : x - 256;
}

int8_t david(uint8_t x) {
    return (x ^ 0x80) - 0x80;
}

compiled with -Os -fomit-frame-pointer yields the following:

_alf:
0000000000000000    movsbl  %dil,%eax
0000000000000004    ret
_steve:
0000000000000005    movsbl  %dil,%eax
0000000000000009    ret
_david:
000000000000000a    movsbl  %dil,%eax
000000000000000e    ret

Note that all three implementations are identical after optimization. Clang/LLVM gives exactly the same result. Similarly, if we build for ARM instead of x86:

_alf:
00000000        b240    sxtb    r0, r0
00000002        4770    bx  lr
_steve:
00000004        b240    sxtb    r0, r0
00000006        4770    bx  lr
_david:
00000008        b240    sxtb    r0, r0
0000000a        4770    bx  lr

Protecting your implementation against corner cases when it has no cost for the "usual" case is never "silly".

To the argument that this adds needless complexity, I say: which is harder -- writing a comment to explain the conversion and why it is there, or your successor's intern trying to debug the problem 10 years from now when a new compiler breaks the lucky happenstance that you've been silently depending on all this time? Is the following really so hard to maintain?

// The C99 standard does not guarantee the behavior of conversion
// from uint8_t to int8_t when the value to be converted is larger
// than 127.  This function implements a conversion that is
// guaranteed to wrap as though the unsigned value were simply
// reinterpreted as a twos-complement value.  With most compilers
// on most systems, it will be optimized away entirely.
int8_t safeConvert(uint8_t x) {
    return x < 128 ? x : x - 256;
}

When all is said and done, I agree that this is vaguely over the top, but I also think we should try to answer the question at face value. A better solution, of course, would be for the C standard to pin down the behavior of conversions from unsigned to signed when the signed type is a twos-complement integer without padding (thus specifying the behavior for all of the intN_t types).

like image 189
Stephen Canon Avatar answered Nov 15 '22 04:11

Stephen Canon


Conversion of uint8_t to int8_t essentially reverses the order of the two half-ranges. "High" numbers become "low." This can be accomplished with XOR.

x ^ 0x80

However, all the numbers are still positive. That's no good. We need to introduce the proper sign and restore the proper magnitude.

return ( x ^ 0x80 ) - 0x80;

There you go!

like image 34
Potatoswatter Avatar answered Nov 15 '22 05:11

Potatoswatter


I don't know if this has any practical value, but here's a different approach that came to mind:

uint8_t input;
int8_t output;
*(uint8_t *)&output = input;

Note that:

  • int8_t is required to be twos complement.
  • Corresponding signed and unsigned types are required to have the same representation for the overlapping part of their ranges, so that a value that's in the range of both the signed and unsigned type can be accessed through either type of pointer.
  • That leaves only one bit, which must be the twos complement sign bit.

The only way I can see that this reasoning might fail to be valid is if CHAR_BIT>8 and the 8-bit integer types are extended integer types with trap bits that somehow flag whether the value is signed or unsigned. However, the following analogous code using char types explicitly could never fail:

unsigned char input;
signed char output;
*(unsigned char *)output = input;

because char types cannot have padding/trap bits.

A potential variant would be:

return ((union { uint8_t u; int8_t s; }){ input }).s;

or for char types:

return ((union { unsigned char u; signed char s; }){ input }).s;

Edit: As Steve Jessop pointed out in another answer, int8_t and uint8_t are required not to have padding bits if they exist, so their existence implies CHAR_BIT==8. So I'm confident that this approach is valid. With that said, I would still never use uint8_t and always explicitly use unsigned char, in case the implementation implements uint8_t as an equal-size extended integer type, because char types have special privileges with respect to aliasing rules and type punning which make them more desirable.

like image 36
R.. GitHub STOP HELPING ICE Avatar answered Nov 15 '22 06:11

R.. GitHub STOP HELPING ICE