I'm a bit confused with differences between unsigned char
(which is also BYTE
in WinAPI) and char
pointers.
Currently I'm working with some ATL-based legacy code and I see a lot of expressions like the following:
CAtlArray<BYTE> rawContent;
CALL_THE_FUNCTION_WHICH_FILLS_RAW_CONTENT(rawContent);
return ArrayToUnicodeString(rawContent);
// or return ArrayToAnsiString(rawContent);
Now, the implementations of ArrayToXXString
look the following way:
CStringA ArrayToAnsiString(const CAtlArray<BYTE>& array)
{
CAtlArray<BYTE> copiedArray;
copiedArray.Copy(array);
copiedArray.Add('\0');
// Casting from BYTE* -> LPCSTR (const char*).
return CStringA((LPCSTR)copiedArray.GetData());
}
CStringW ArrayToUnicodeString(const CAtlArray<BYTE>& array)
{
CAtlArray<BYTE> copiedArray;
copiedArray.Copy(array);
copiedArray.Add('\0');
copiedArray.Add('\0');
// Same here.
return CStringW((LPCWSTR)copiedArray.GetData());
}
So, the questions:
Is the C-style cast from BYTE*
to LPCSTR
(const char*
) safe for all possible cases?
Is it really necessary to add double null-termination when converting array data to wide-character string?
The conversion routine CStringW((LPCWSTR)copiedArray.GetData())
seems invalid to me, is that true?
Any way to make all this code easier to understand and to maintain?
An unsigned type can only represent postive values (and zero) where as a signed type can represent both positive and negative values (and zero). In the case of a 8-bit char this means that an unsigned char variable can hold a value in the range 0 to 255 while a signed char has the range -128 to 127.
The unsinged char type is usually used as a representation of a single byte of binary data. Thus, and array is often used as a binary data buffer, where each element is a singe byte. The unsigned char* construct will be a pointer to the binary data buffer (or its 1st element).
For the array, the total string is stored in the stack section, but for the pointer, the pointer variable is stored into stack section, and content is stored at code section. And the most important difference is that, we cannot edit the pointer type string. So this is read-only.
Unsigned char must be used for accessing memory as a block of bytes or for small unsigned integers. Signed char must be used for small signed integers and simple char must be used only for ASCII characters and strings.
The C standard is kind of weird when it comes to the definition of a byte. You do have a couple of guarantees though.
This definition doesn't mesh well with older platforms where a byte was 6 or 7 bits long, but it does mean BYTE*,
and char *
are guaranteed to be equivalent.
Multiple nulls are needed at the end of a Unicode string because there are valid Unicode characters that start with a zero (null) byte.
As for making the code easier to read, that is completely a matter of style. This code appears to be written in a style used by a lot of old C Windows code, which has definitely fallen out of favor. There are probably a ton of ways to make it clearer for you, but how to make it clearer has no clear answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With