When storing "byte arrays" (blobs...) is it better to use char or unsigned char for the items (unsigned char a.k.a. uint8_t)? (Standard says that sizeof of both is precisely 1 Byte.)
Does it matter at all? Or one is more convenient or prevalent than the other? Maybe, what libraries like Boost do use?
An unsigned char data type that occupies 1 byte of memory. It is the same as the byte datatype. The unsigned char datatype encodes numbers from 0 to 255. For consistency of Arduino programming style, the byte data type is to be preferred.
Both of the Signed and Unsigned char, they are of 8-bits. So for signed char it can store value from -128 to +127, and the unsigned char will store 0 to 255. The basic ASCII values are in range 0 to 127.
If char is signed, then performing arithmetic on a byte value with the high bit set will result in sign extension when promoting to int; so, for example:
char c = '\xf0';
int res = (c << 24) | (c << 16) | (c << 8) | c;
will give 0xfffffff0 instead of 0xf0f0f0f0.  This can be avoided by masking with 0xff.
char may still be preferable if you're interfacing with libraries that use it instead of unsigned char.
Note that a cast from char * to/from unsigned char * is always safe (3.9p2).  A philosophical reason to favour unsigned char is that 3.9p4 in the standard favours it, at least for representing byte arrays that could hold memory representations of objects:
The object representation of an object of type
Tis the sequence ofNunsigned charobjects taken up by the object of typeT, whereNequalssizeof(T).
Theoretically, the size of a byte in C++ is dependant on the compiler-settings and target platform, but it is guaranteed to be at least 8 bits, which explains why sizeof(uint8_t) is required to be 1.
Here's more precisely what the standard has to say about it
§1.71
The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation-defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit. The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.
So, if you are working on some special hardware where bytes are not 8 bits, it may make a practical difference. Otherwise, I'd say that it's a matter of taste and what information you want to communicate via the choice of type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With