When storing "byte arrays" (blobs...) is it better to use <code>char</code> or <code>unsigned char</code> for the items (<code>unsigned char</code> a.k.a. <code>uint8_t</code>)? (Standard says that <code>sizeof</code> of both is precisely 1 Byte.) Does it matter at all? Or one is more convenient or prevalent than the other? Maybe, what libraries like Boost do use?

If <code>char</code> is signed, then performing arithmetic on a byte value with the high bit set will result in sign extension when promoting to <code>int</code>; so, for example: <pre class="prettyprint"><code>char c = '\xf0'; int res = (c << 24) | (c << 16) | (c << 8) | c; </code></pre> will give <code>0xfffffff0</code> instead of <code>0xf0f0f0f0</code>. This can be avoided by masking with <code>0xff</code>. <code>char</code> may still be preferable if you're interfacing with libraries that use it instead of <code>unsigned char</code>. Note that a cast from <code>char *</code> to/from <code>unsigned char *</code> is always safe (3.9p2). A philosophical reason to favour <code>unsigned char</code> is that 3.9p4 in the standard favours it, at least for representing byte arrays that could hold memory representations of objects: <blockquote> The object representation of an object of type <code>T</code> is the sequence of <code>N</code> <code>unsigned char</code> objects taken up by the object of type <code>T</code>, where <code>N</code> equals <code>sizeof(T)</code>. </blockquote>

Char vs unsigned char for byte arrays

2 Answers

If char is signed, then performing arithmetic on a byte value with the high bit set will result in sign extension when promoting to int; so, for example:

char c = '\xf0';
int res = (c << 24) | (c << 16) | (c << 8) | c;

will give 0xfffffff0 instead of 0xf0f0f0f0. This can be avoided by masking with 0xff.

char may still be preferable if you're interfacing with libraries that use it instead of unsigned char.

Note that a cast from char * to/from unsigned char * is always safe (3.9p2). A philosophical reason to favour unsigned char is that 3.9p4 in the standard favours it, at least for representing byte arrays that could hold memory representations of objects:

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).

answered Oct 04 '22 21:10

ecatmur

Theoretically, the size of a byte in C++ is dependant on the compiler-settings and target platform, but it is guaranteed to be at least 8 bits, which explains why sizeof(uint8_t) is required to be 1.

Here's more precisely what the standard has to say about it

§1.71

The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation-defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit. The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.

So, if you are working on some special hardware where bytes are not 8 bits, it may make a practical difference. Otherwise, I'd say that it's a matter of taste and what information you want to communicate via the choice of type.

answered Oct 04 '22 19:10

Agentlien

Related questions
                            
                                Is there any movement towards specifying interaction of C++ exceptions and pthread cancellation?
                            
                                Efficient circular list
                            
                                How to use std::ref?
                            
                                Reading asynchronously from stdin with Qt
                            
                                Is it possible to use an std::string for read()?
                            
                                Check XMM register for all zeroes
                            
                                C++11 Passing 'this' as paramenter for std::make_shared
                            
                                Why scoped pointers in boost
                            
                                fastest technique to read a file into memory?
                            
                                Why is the complexity of the C++ STL map container O(log(n))?
                            
                                Can a stringstream throw an exception when reading a primitive?
                            
                                What is the Windows equivalent of the Unix function gmtime_r?
                            
                                C11/C++11 Memory Model
                            
                                How can I convert a std::basic_string type to an array of char type?
                            
                                64bit vc++ program seems to run under 32bit mode
                            
                                Maximum memory that can be allocated dynamically and at compile time in c++
                            
                                Templated unions in c++11
                            
                                appending character array to string c++
                            
                                How to get a reference to an object having shared_ptr to it?
                            
                                Eclipse failed to execute MI command -target-select remote

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Char vs unsigned char for byte arrays

Tags:

c++

gcc

c++11

Cartesius00

People also ask

2 Answers

ecatmur

Agentlien

Recent Activity

Donate For Us