Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between unsigned char and char pointers

I'm a bit confused with differences between unsigned char (which is also BYTE in WinAPI) and char pointers.

Currently I'm working with some ATL-based legacy code and I see a lot of expressions like the following:

CAtlArray<BYTE> rawContent;
CALL_THE_FUNCTION_WHICH_FILLS_RAW_CONTENT(rawContent);
return ArrayToUnicodeString(rawContent);
// or return ArrayToAnsiString(rawContent);

Now, the implementations of ArrayToXXString look the following way:

CStringA ArrayToAnsiString(const CAtlArray<BYTE>& array)
{
    CAtlArray<BYTE> copiedArray;
    copiedArray.Copy(array);
    copiedArray.Add('\0');

    // Casting from BYTE* -> LPCSTR (const char*).
    return CStringA((LPCSTR)copiedArray.GetData());
}

CStringW ArrayToUnicodeString(const CAtlArray<BYTE>& array)
{
    CAtlArray<BYTE> copiedArray;
    copiedArray.Copy(array);

    copiedArray.Add('\0');
    copiedArray.Add('\0');

    // Same here.        
    return CStringW((LPCWSTR)copiedArray.GetData());
}

So, the questions:

  • Is the C-style cast from BYTE* to LPCSTR (const char*) safe for all possible cases?

  • Is it really necessary to add double null-termination when converting array data to wide-character string?

  • The conversion routine CStringW((LPCWSTR)copiedArray.GetData()) seems invalid to me, is that true?

  • Any way to make all this code easier to understand and to maintain?

like image 695
Yippie-Ki-Yay Avatar asked Feb 10 '12 13:02

Yippie-Ki-Yay


People also ask

What is the difference between char and unsigned char?

An unsigned type can only represent postive values (and zero) where as a signed type can represent both positive and negative values (and zero). In the case of a 8-bit char this means that an unsigned char variable can hold a value in the range 0 to 255 while a signed char has the range -128 to 127.

What is unsigned char pointer?

The unsinged char type is usually used as a representation of a single byte of binary data. Thus, and array is often used as a binary data buffer, where each element is a singe byte. The unsigned char* construct will be a pointer to the binary data buffer (or its 1st element).

What is the difference between char array and char pointer?

For the array, the total string is stored in the stack section, but for the pointer, the pointer variable is stored into stack section, and content is stored at code section. And the most important difference is that, we cannot edit the pointer type string. So this is read-only.

Should I use signed or unsigned char?

Unsigned char must be used for accessing memory as a block of bytes or for small unsigned integers. Signed char must be used for small signed integers and simple char must be used only for ASCII characters and strings.


1 Answers

The C standard is kind of weird when it comes to the definition of a byte. You do have a couple of guarantees though.

  • A byte will always be one char in size
    • sizeof(char) always returns 1
  • A byte will be at least 8 bits in size

This definition doesn't mesh well with older platforms where a byte was 6 or 7 bits long, but it does mean BYTE*, and char * are guaranteed to be equivalent.

Multiple nulls are needed at the end of a Unicode string because there are valid Unicode characters that start with a zero (null) byte.

As for making the code easier to read, that is completely a matter of style. This code appears to be written in a style used by a lot of old C Windows code, which has definitely fallen out of favor. There are probably a ton of ways to make it clearer for you, but how to make it clearer has no clear answer.

like image 82
Swiss Avatar answered Sep 29 '22 06:09

Swiss