Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bytewise reading of memory: "signed char *" vs "unsigned char *"

One often needs to read from memory one byte at a time, like in this naive memcpy() implementation:

void *memcpy(void *dest, const void *src, size_t n)
{
    char *from = (char *)src;
    char *to   = (char *)dest;

    while(n--) *to++ = *from++;

    return dest;
}

However, I sometimes see people explicitly use unsigned char * instead of just char *.

Of course, char and unsigned char may not be equal. But does it make a difference whether I use char *, signed char *, or unsigned char * when bytewise reading/writing memory?

UPDATE: Actually, I'm fully aware that c=200 may have different values depending on the type of c. What I am asking here is why people sometimes use unsigned char * instead of just char * when reading memory, e.g. in order to store an uint32_t in a char[4].

like image 657
Philip Avatar asked Dec 05 '11 13:12

Philip


People also ask

What is the difference between signed char and unsigned char?

A signed char is a signed value which is typically smaller than, and is guaranteed not to be bigger than, a short . An unsigned char is an unsigned value which is typically smaller than, and is guaranteed not to be bigger than, a short .

What is the range of signed char and unsigned char?

All signed character values range from -128 to 127. All unsigned character values range from 0 to 255. The /J compiler option changes the default type for char from signed char to unsigned char .

Should char be signed or unsigned?

On x86 systems char is generally signed. On arm systems it is generally unsigned (Apple iOS is an exception).


2 Answers

You should use unsigned char. The C99 standard says that unsigned char is the only type guaranteed to be dense (no padding bits), and also defines that you may copy any object (except bitfields) exactly by copying it into an unsigned char array, which is the object representation in bytes.

The sensible interepretation of this is to me, that if you use a pointer to access an object as bytes, you should use unsigned char.

Reference: http://blackshell.com/~msmud/cstd.html#6.2.6.1 (From C1x draft C99)

like image 156
u0b34a0f6ae Avatar answered Nov 13 '22 06:11

u0b34a0f6ae


This is one point where C++ differs from C. Generally speaking, C only guarantees that raw memory access works for unsigned char; char may be signed, and on a 1's complement or signed magnitude machine, a -0 might be converted to +0 automatically, changing the bit pattern. For some reason (unknown to me), the C++ committee extends the guarantees supporting transparent copy (no change in bit patterns) to char, as well as unsigned char; on a 1's complement or signed magnitude machine, the implementors have no choice but to make plain char unsigned, in order to avoid such side effects. (And of course, most programmers today aren't concerned by such machines anyway.)

Anyway, the end result is that older programmers, who come from a C background (and maybe have actually worked on a 1's complement or a signed magnitude machine) will automatically use unsigned char. It's also a frequent convention to reserve plain char for character data uniquely, with signed char for very small integral values, and unsigned char for raw memory, or when bit manipulation is intended. Such a rule allows the reader to distinguish between different uses (provided it is followed religiously).

like image 36
James Kanze Avatar answered Nov 13 '22 08:11

James Kanze