I want to use a function that expects data like this:
void process(char *data_in, int data_len);
So it's just processing some bytes really.
But I'm more comfortable working with "unsigned char" when it comes to raw bytes (it somehow "feels" more right to deal with positive 0 to 255 values only), so my question is:
Can I always safely pass a unsigned char *
into this function?
In other words:
Bonus: Is the answer same in C and C++?
A signed char is a signed value which is typically smaller than, and is guaranteed not to be bigger than, a short . An unsigned char is an unsigned value which is typically smaller than, and is guaranteed not to be bigger than, a short .
On x86 systems char is generally signed. On arm systems it is generally unsigned (Apple iOS is an exception).
unsigned char is essentially a one byte of memory interpreted by the computer as an integer it is from 0 to 255. An integer type is usually 4 bytes with range -2147483648 to 2147483647. Conversion usually involves assignments from one value to another.
unsigned char ch = 'a'; Initializing an unsigned char: Here we try to insert a char in the unsigned char variable with the help of ASCII value. So the ASCII value 97 will be converted to a character value, i.e. 'a' and it will be inserted in unsigned char.
The short answer is yes if you use an explicit cast, but to explain it in detail, there are three aspects to look at:
1) Legality of the conversion
Converting between signed T*
and unsigned T*
(for some type T
) in either direction is generally possible because the source type can first be converted to void *
(this is a standard conversion, §4.10), and the void *
can be converted to the destination type using an explicit static_cast
(§5.2.9/13):
static_cast<unsigned char*>(static_cast<void *>(data_in))
This can be abbreviated (§5.2.10/7) as
reinterpret_cast<unsigned char *>(data_in)
because char
is a standard-layout type (§3.9.1/7,8 and §3.9/9) and signedness does not change alignment (§3.9.1/1). It can also be written as a C-style cast:
(unsigned char *)(data_in)
Again, this works both ways, from unsigned*
to signed*
and back. There is also a guarantee that if you apply this procedure one way and then back, the pointer value (i.e. the address it's pointing to) won't have changed (§5.2.10/7).
All of this applies not only to conversions between signed char *
and unsigned char *
, but also to char *
/unsigned char *
and char *
/signed char *
, respectively. (char
, signed char
and unsigned char
are formally three distinct types, §3.9.1/1.)
To be clear, it doesn't matter which of the three cast-methods you use, but you must use one. Merely passing the pointer will not work, as the conversion, while legal, is not a standard conversion, so it won't be performed implicitly (the compiler will issue an error if you try).
2) Well-definedness of the access to the values
What happens if, inside the function, you dereference the pointer, i.e. you perform *data_in
to retrieve a glvalue for the underlying character; is this well-defined and legal? The relevant rule here is the strict-aliasing rule (§3.10/10):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
- [...]
- a type that is the signed or unsigned type corresponding to the dynamic type of the object,
- [...]
- a
char
orunsigned char
type.
Therefore, accessing a signed char
(or char
) through an unsigned char*
(or char
) and vice versa is not disallowed by this rule – you should be able to do this without problems.
3) Resulting values
After derefencing the type-converted pointer, will you be able to work with the value you get? It's important to bear in mind that the conversion and dereferencing of the pointer described above amounts to reinterpreting (not changing!) the bit pattern stored at the address of the character. So what happens when a bit pattern for a signed character is interpreted as that of an unsigned character (or vice versa)?
When going from unsigned to signed, the typical effect will be that for values between 0 and 128 nothing happens, and values above 128 become negative. Similar in reverse: When going from signed to unsigned, negative values will appear as values greater than 128.
But this behaviour isn't actually guaranteed by the Standard. The only thing the Standard guarantees is that for all three types, char
, unsigned char
and signed char
, all bits (not necessarily 8, btw) are used for the value representation. So if you interpret one as the other, make a few copies and then store it back to the original location, you can be sure that there will be no information loss (as you required), but you won't necessarily know what the values actually mean (at least not in a fully portable way).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With