Should a buffer of bytes be signed or unsigned char buffer?

2 Answers

If you intend to store arbitrary binary data, you should use unsigned char. It is the only data type that is guaranteed to have no padding bits by the C Standard. Each other data type may contain padding bits in its object representation (that is the one that contains all bits of an object, instead of only those that determines a value). The padding bits' state is unspecified and are not used to store values. So if you read using char some binary data, things would be cut down to the value range of a char (by interpreting only the value bits), but there may still be bits that are just ignored but still are there and read by memcpy. Much like padding bits in real struct objects. Type unsigned char is guaranteed to not contain those. That follows from 5.2.4.2.1/2 (C99 TC2, n1124 here):

If the value of an object of type char is treated as a signed integer when used in an expression, the value of CHAR_MIN shall be the same as that of SCHAR_MIN and the value of CHAR_MAX shall be the same as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall be 0 and the value of CHAR_MAX shall be the same as that of UCHAR_MAX. The value UCHAR_MAX shall equal 2^CHAR_BIT − 1

From the last sentence it follows that there is no space left for any padding bits. If you use char as the type of your buffer, you also have the problem of overflows: Assigning any value explicitly to one such element which is in the range of 8 bits - so you may expect such assignment to be OK - but not within the range of a char, which is CHAR_MIN..CHAR_MAX, such a conversion overflows and causes implementation defined results, including raise of signals.

Even if any problems regarding the above would probably not show in real implementations (would be a very poor quality of implementation), you are best to use the right type from the beginning onwards, which is unsigned char.

For strings, however, the data type of choice is char, which will be understood by string and print functions. Using signed char for these purposes looks like a wrong decision to me.

For further information, read this proposal which contain a fix for a next version of the C Standard which eventually will require signed char not have any padding bits either. It's already incorporated into the working paper.

176

answered Sep 18 '22 21:09

Johannes Schaub - litb

Should a buffer of bytes be signed char or unsigned char or simply a char buffer? Any differences between C and C++?

A minor difference in how the language treats it. A huge difference in how convention treats it.

char = ASCII (or UTF-8, but the signedness gets in the way there) textual data
unsigned char = byte
signed char = rarely used

And there is code that relies on such a distinction. Just a week or two ago I encountered a bug where JPEG data was getting corrupted because it was being passed to the char* version of our Base64 encode function — which "helpfully" replaced all the invalid UTF-8 in the "string". Changing to BYTE aka unsigned char was all it took to fix it.

answered Sep 19 '22 21:09

dan04

Related questions
                            
                                Using a C++ class member function as a C callback function
                            
                                How to check whether operator== exists?
                            
                                Do the &= and |= operators for bool short-circuit?
                            
                                c++ boost split string
                            
                                Do C++ enums Start at 0?
                            
                                Why isn't there an endianness modifier in C++ like there is for signedness?
                            
                                Format specifiers for uint8_t, uint16_t, ...?
                            
                                What are top-level const qualifiers?
                            
                                C++ - char** argv vs. char* argv[]
                            
                                Why would someone use #define to define constants?
                            
                                When is a const reference better than pass-by-value in C++11?
                            
                                Headers Including Each Other in C++
                            
                                What is 1LL or 2LL in C and C++?
                            
                                Type safe physics operations in C++
                            
                                C++ Threads, std::system_error - operation not permitted? [duplicate]
                            
                                Why isn't C/C++'s "#pragma once" an ISO standard?
                            
                                How to print pthread_t
                            
                                Is memory allocation a system call?
                            
                                Last key in a std::map
                            
                                Unknown compiler version while compiling Boost with MSVC 14.0 (VS 2015)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Should a buffer of bytes be signed or unsigned char buffer?

Tags:

c++

c

char

buffer

jackhab

People also ask

2 Answers

Johannes Schaub - litb

dan04

Recent Activity

Donate For Us