I frequently work with libraries that use char when working with bytes in C++. The alternative is to define a "Byte" as unsigned char but that not the standard they decided to use. I frequently pass bytes from C# into the C++ dlls and cast them to char to work with the library.
When casting ints to chars or chars to other simple types what are some of the side effects that can occur. Specifically, when has this broken code that you have worked on and how did you find out it was because of the char signedness?
Lucky i haven't run into this in my code, used a char signed casting trick back in an embedded systems class in school. I'm looking to better understand the issue since I feel it is relevant to the work I am doing.
One major risk is if you need to shift the bytes. A signed char keeps the sign-bit when right-shifted, whereas an unsigned char doesn't. Here's a small test program:
#include <stdio.h>
int main (void)
{
signed char a = -1;
unsigned char b = 255;
printf("%d\n%d\n", a >> 1, b >> 1);
return 0;
}
It should print -1 and 127, even though a and b start out with the same bit pattern (given 8-bit chars, two's-complement and signed values using arithmetic shift).
In short, you can't rely on shift working identically for signed and unsigned chars, so if you need portability, use unsigned char
rather than char
or signed char
.
The most obvious gotchas come when you need to compare the numeric value of a char
with a hexadecimal constant when implementing protocols or encoding schemes.
For example, when implementing telnet you might want to do this.
// Check for IAC (hex FF) byte
if (ch == 0xFF)
{
// ...
Or when testing for UTF-8 multi-byte sequences.
if (ch >= 0x80)
{
// ...
Fortunately these errors don't usually survive very long as even the most cursory testing on a platform with a signed char
should reveal them. They can be fixed by using a character constant, converting the numeric constant to a char
or converting the character to an unsigned char
before the comparison operator promotes both to an int
. Converting the char
directly to an unsigned
won't work, though.
if (ch == '\xff') // OK
if ((unsigned char)ch == 0xff) // OK, so long as char has 8-bits
if (ch == (char)0xff) // Usually OK, relies on implementation defined behaviour
if ((unsigned)ch == 0xff) // still wrong
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With