When storing "byte arrays" (blobs...) is it better to use char
or unsigned char
for the items (unsigned char
a.k.a. uint8_t
)? (Standard says that sizeof
of both is precisely 1 Byte.)
Does it matter at all? Or one is more convenient or prevalent than the other? Maybe, what libraries like Boost do use?
An unsigned char data type that occupies 1 byte of memory. It is the same as the byte datatype. The unsigned char datatype encodes numbers from 0 to 255. For consistency of Arduino programming style, the byte data type is to be preferred.
Both of the Signed and Unsigned char, they are of 8-bits. So for signed char it can store value from -128 to +127, and the unsigned char will store 0 to 255. The basic ASCII values are in range 0 to 127.
If char
is signed, then performing arithmetic on a byte value with the high bit set will result in sign extension when promoting to int
; so, for example:
char c = '\xf0';
int res = (c << 24) | (c << 16) | (c << 8) | c;
will give 0xfffffff0
instead of 0xf0f0f0f0
. This can be avoided by masking with 0xff
.
char
may still be preferable if you're interfacing with libraries that use it instead of unsigned char
.
Note that a cast from char *
to/from unsigned char *
is always safe (3.9p2). A philosophical reason to favour unsigned char
is that 3.9p4 in the standard favours it, at least for representing byte arrays that could hold memory representations of objects:
The object representation of an object of type
T
is the sequence ofN
unsigned char
objects taken up by the object of typeT
, whereN
equalssizeof(T)
.
Theoretically, the size of a byte in C++ is dependant on the compiler-settings and target platform, but it is guaranteed to be at least 8 bits, which explains why sizeof(uint8_t)
is required to be 1.
Here's more precisely what the standard has to say about it
§1.71
The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation-defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit. The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.
So, if you are working on some special hardware where bytes are not 8 bits, it may make a practical difference. Otherwise, I'd say that it's a matter of taste and what information you want to communicate via the choice of type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With