Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Char vs unsigned char for byte arrays

Tags:

c++

gcc

c++11

When storing "byte arrays" (blobs...) is it better to use char or unsigned char for the items (unsigned char a.k.a. uint8_t)? (Standard says that sizeof of both is precisely 1 Byte.)

Does it matter at all? Or one is more convenient or prevalent than the other? Maybe, what libraries like Boost do use?

like image 456
Cartesius00 Avatar asked Dec 11 '12 11:12

Cartesius00


People also ask

Is byte char or unsigned char?

An unsigned char data type that occupies 1 byte of memory. It is the same as the byte datatype. The unsigned char datatype encodes numbers from 0 to 255. For consistency of Arduino programming style, the byte data type is to be preferred.

What is the difference between char and unsigned char?

Both of the Signed and Unsigned char, they are of 8-bits. So for signed char it can store value from -128 to +127, and the unsigned char will store 0 to 255. The basic ASCII values are in range 0 to 127.


2 Answers

If char is signed, then performing arithmetic on a byte value with the high bit set will result in sign extension when promoting to int; so, for example:

char c = '\xf0';
int res = (c << 24) | (c << 16) | (c << 8) | c;

will give 0xfffffff0 instead of 0xf0f0f0f0. This can be avoided by masking with 0xff.

char may still be preferable if you're interfacing with libraries that use it instead of unsigned char.

Note that a cast from char * to/from unsigned char * is always safe (3.9p2). A philosophical reason to favour unsigned char is that 3.9p4 in the standard favours it, at least for representing byte arrays that could hold memory representations of objects:

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).

like image 64
ecatmur Avatar answered Oct 04 '22 21:10

ecatmur


Theoretically, the size of a byte in C++ is dependant on the compiler-settings and target platform, but it is guaranteed to be at least 8 bits, which explains why sizeof(uint8_t) is required to be 1.

Here's more precisely what the standard has to say about it

§1.71

The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation-defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit. The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.

So, if you are working on some special hardware where bytes are not 8 bits, it may make a practical difference. Otherwise, I'd say that it's a matter of taste and what information you want to communicate via the choice of type.

like image 44
Agentlien Avatar answered Oct 04 '22 19:10

Agentlien