I'm a bit confused with differences between <code>unsigned char</code> (which is also <code>BYTE</code> in WinAPI) and <code>char</code> pointers. Currently I'm working with some ATL-based legacy code and I see a lot of expressions like the following: <pre class="prettyprint"><code>CAtlArray<BYTE> rawContent; CALL_THE_FUNCTION_WHICH_FILLS_RAW_CONTENT(rawContent); return ArrayToUnicodeString(rawContent); // or return ArrayToAnsiString(rawContent); </code></pre> Now, the implementations of <code>ArrayToXXString</code> look the following way: <pre class="prettyprint"><code>CStringA ArrayToAnsiString(const CAtlArray<BYTE>& array) { CAtlArray<BYTE> copiedArray; copiedArray.Copy(array); copiedArray.Add('\0'); // Casting from BYTE* -> LPCSTR (const char*). return CStringA((LPCSTR)copiedArray.GetData()); } CStringW ArrayToUnicodeString(const CAtlArray<BYTE>& array) { CAtlArray<BYTE> copiedArray; copiedArray.Copy(array); copiedArray.Add('\0'); copiedArray.Add('\0'); // Same here. return CStringW((LPCWSTR)copiedArray.GetData()); } </code></pre> <hr> So, the questions: <ul> <li>Is the C-style cast from <code>BYTE*</code> to <code>LPCSTR</code> (<code>const char*</code>) safe for all possible cases?</li> <li>Is it really necessary to add double null-termination when converting array data to wide-character string?</li> <li>The conversion routine <code>CStringW((LPCWSTR)copiedArray.GetData())</code> seems invalid to me, is that true?</li> <li>Any way to make all this code easier to understand and to maintain?</li> </ul>

The C standard is kind of weird when it comes to the definition of a byte. You do have a couple of guarantees though. <ul> <li>A byte will always be one char in size <ul> <li>sizeof(char) always returns 1</li> </ul> </li> <li>A byte will be at least 8 bits in size</li> </ul> This definition doesn't mesh well with older platforms where a byte was 6 or 7 bits long, but it does mean <code>BYTE*,</code> and <code>char *</code> are guaranteed to be equivalent. Multiple nulls are needed at the end of a Unicode string because there are valid Unicode characters that start with a zero (null) byte. As for making the code easier to read, that is completely a matter of style. This code appears to be written in a style used by a lot of old C Windows code, which has definitely fallen out of favor. There are probably a ton of ways to make it clearer for you, but how to make it clearer has no clear answer.

Difference between unsigned char and char pointers

Tags:

c++

char

byte

atl

unsigned-char

I'm a bit confused with differences between unsigned char (which is also BYTE in WinAPI) and char pointers.

Currently I'm working with some ATL-based legacy code and I see a lot of expressions like the following:

CAtlArray<BYTE> rawContent;
CALL_THE_FUNCTION_WHICH_FILLS_RAW_CONTENT(rawContent);
return ArrayToUnicodeString(rawContent);
// or return ArrayToAnsiString(rawContent);

Now, the implementations of ArrayToXXString look the following way:

CStringA ArrayToAnsiString(const CAtlArray<BYTE>& array)
{
    CAtlArray<BYTE> copiedArray;
    copiedArray.Copy(array);
    copiedArray.Add('\0');

    // Casting from BYTE* -> LPCSTR (const char*).
    return CStringA((LPCSTR)copiedArray.GetData());
}

CStringW ArrayToUnicodeString(const CAtlArray<BYTE>& array)
{
    CAtlArray<BYTE> copiedArray;
    copiedArray.Copy(array);

    copiedArray.Add('\0');
    copiedArray.Add('\0');

    // Same here.        
    return CStringW((LPCWSTR)copiedArray.GetData());
}

So, the questions:

Is the C-style cast from BYTE* to LPCSTR (const char*) safe for all possible cases?
Is it really necessary to add double null-termination when converting array data to wide-character string?
The conversion routine CStringW((LPCWSTR)copiedArray.GetData()) seems invalid to me, is that true?
Any way to make all this code easier to understand and to maintain?

695

asked Feb 10 '12 13:02

Yippie-Ki-Yay

1 Answers

The C standard is kind of weird when it comes to the definition of a byte. You do have a couple of guarantees though.

A byte will always be one char in size
- sizeof(char) always returns 1
A byte will be at least 8 bits in size

This definition doesn't mesh well with older platforms where a byte was 6 or 7 bits long, but it does mean BYTE*, and char * are guaranteed to be equivalent.

Multiple nulls are needed at the end of a Unicode string because there are valid Unicode characters that start with a zero (null) byte.

As for making the code easier to read, that is completely a matter of style. This code appears to be written in a style used by a lot of old C Windows code, which has definitely fallen out of favor. There are probably a ton of ways to make it clearer for you, but how to make it clearer has no clear answer.

answered Sep 29 '22 06:09

Swiss

Related questions
                            
                                If You Have A GeoTiff, Would It Be Possible To Transform A Lat/Lon Point To An X,Y Using The GeoTransform?
                            
                                Unrecognised type - 'Error: Variable "[var-name]" is not a type name'
                            
                                C++ overloading assignment operator
                            
                                How to initialize a bit field in a constructor
                            
                                Specialisation of function template in another class/namespace?
                            
                                Is the "C++ dlopen mini HOWTO" the recommended technique for compiling dynamically loaded C++ plugin libraries?
                            
                                What does this C++ setter/getter pattern break?
                            
                                Can I perform sanity checking in the initializer list of a constructor?
                            
                                Will temporary object be deleted if there is no const reference to it?
                            
                                How to find a (segmentation fault) bug in C++ (pthread) multithread program on linux?
                            
                                WPF and DirectX 11 via D3DImage
                            
                                C++ make_shared not available
                            
                                MinGW Link Single EXE
                            
                                Static non templated function in templated class
                            
                                How can I inspect a static library to see if the debug symbols are being exported?
                            
                                How do I parallelize a for loop through a C++ std::list using OpenMP?
                            
                                Are .Net OpenCV wrappers worth using? [closed]
                            
                                How much existing C++ code would break if void was actually defined as `struct void {};`
                            
                                C++ Serialization Performance
                            
                                How to hide an implementation helper template?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With