Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Worst side effects from chars signedness. (Explanation of signedness effects on chars and casts)

I frequently work with libraries that use char when working with bytes in C++. The alternative is to define a "Byte" as unsigned char but that not the standard they decided to use. I frequently pass bytes from C# into the C++ dlls and cast them to char to work with the library.

When casting ints to chars or chars to other simple types what are some of the side effects that can occur. Specifically, when has this broken code that you have worked on and how did you find out it was because of the char signedness?

Lucky i haven't run into this in my code, used a char signed casting trick back in an embedded systems class in school. I'm looking to better understand the issue since I feel it is relevant to the work I am doing.

like image 344
QueueHammer Avatar asked Feb 03 '10 15:02

QueueHammer


2 Answers

One major risk is if you need to shift the bytes. A signed char keeps the sign-bit when right-shifted, whereas an unsigned char doesn't. Here's a small test program:

#include <stdio.h>

int main (void)
{
    signed char a = -1;
    unsigned char b = 255;

    printf("%d\n%d\n", a >> 1, b >> 1);

    return 0;
}

It should print -1 and 127, even though a and b start out with the same bit pattern (given 8-bit chars, two's-complement and signed values using arithmetic shift).

In short, you can't rely on shift working identically for signed and unsigned chars, so if you need portability, use unsigned char rather than char or signed char.

like image 145
Vatine Avatar answered Nov 15 '22 23:11

Vatine


The most obvious gotchas come when you need to compare the numeric value of a char with a hexadecimal constant when implementing protocols or encoding schemes.

For example, when implementing telnet you might want to do this.

// Check for IAC (hex FF) byte
if (ch == 0xFF)
{
    // ...

Or when testing for UTF-8 multi-byte sequences.

if (ch >= 0x80)
{
    // ...

Fortunately these errors don't usually survive very long as even the most cursory testing on a platform with a signed char should reveal them. They can be fixed by using a character constant, converting the numeric constant to a char or converting the character to an unsigned char before the comparison operator promotes both to an int. Converting the char directly to an unsigned won't work, though.

if (ch == '\xff')               // OK

if ((unsigned char)ch == 0xff)  // OK, so long as char has 8-bits

if (ch == (char)0xff)           // Usually OK, relies on implementation defined behaviour

if ((unsigned)ch == 0xff)       // still wrong
like image 30
CB Bailey Avatar answered Nov 16 '22 00:11

CB Bailey