I have an vector<BYTE>
that represents characters in a string. I want to interpret those characters as ASCII characters and store them in a Unicode (UTF-16) string. The current code assumes that the characters in the vector<BYTE>
are Unicode rather than ASCII. This works fine for standard ASCII, but fails for extended ASCII characters. These characters need to be interpreted using the current code page retrieved via GetACP()
. How would I go about creating a Unicode (UTF-16) string with these ASCII characters?
EDIT: I believe the solution should have something to do with the macros discussed here: http://msdn.microsoft.com/en-us/library/87zae4a3(v=vs.80).aspx I'm just not exactly sure how the actual implementation would go.
int ExtractByteArray(CATLString* pszResult, const CByteVector* pabData)
{
// place the data into the output cstring
pszResult->Empty();
for(int iIndex = 0; iIndex < pabData->GetSize(); iIndex++)
*pszResult += (TCHAR)pabData->GetAt(iIndex);
return RC_SUCCESS;
}
To convert byte strings to Unicode use the bytes. decode() method and use str. encode() to convert Unicode to a byte string. Both methods allow the character set encoding to be specified as an optional parameter if something other than UTF-8 is required.
To allow working with Unicode characters, Python 2 has a unicode type which is a collection of Unicode code points (like Python 3's str type). The line ustring = u'A unicode \u018e string \xf1' creates a Unicode string with 20 characters.
Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.
You should use MultibyteToWideChar to convert that string to unicode
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With