Using std::wstring the way I am with MultiByteToWideChar?
std::wstring widen(const std::string &in)
{
int len = MultiByteToWideChar(CP_UTF8, 0, &in[0], -1, NULL, 0);
std::wstring out(len, 0);
MultiByteToWideChar(CP_UTF8, 0, &in[0], -1, &out[0], len);
return out;
}
If you're asking will it work, probably. Is it correct?
in.c_str() instead of &in[0]MultiByteToWideChar at least the first time.MultiByteToWideChar invoked with a (-1) length, if successful, will include accounting for a zero-terminator (i.e. it will always return >= 1 on success). The length-constructor for std::wstring does not require this. std::wstring(5,0) will allocate space for six wide-chars; 5+zero-term. So technically you're allocating one-too-many wide-chars.From MultiByteToWideChar docs on cbMultiByte and -1:
If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting Unicode string has a terminating null character, and the length returned by the function includes this character.
There is a problem with your first call to MultiByteToWideChar: The character sequence is not guaranteed to be zero terminated (although in practice it usually is). Change that line to
int len = MultiByteToWideChar(CP_UTF8, 0, in.c_str(), -1, NULL, 0);
and you should be safe. Even if MultiByteToWideChar fails and returns 0 this is accounted for by passing len as the final parameter in the second call to MultiByteToWideChar.
With that said, it is safe in the sense that it doesn't crash or corrupt memory. There is, however, one more issue: Unless the input string causes MultiByteToWideChar to fail the returned string will claim that its size() is one character larger than it should be. I would recommend changing the code as follows:
std::wstring widen(std::string const &in)
{
std::wstring out{};
if (in.length() > 0)
{
// Calculate target buffer size (not including the zero terminator).
int len = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,
in.c_str(), in.size(), NULL, 0);
if ( len == 0 )
{
throw std::runtime_error("Invalid character sequence.");
}
out.resize(len);
// No error checking. We already know, that the conversion will succeed.
MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,
in.c_str(), in.size(), &out[0], out.size());
// Use out.data() in place of &out[0] for C++17
}
return out;
}
This implementation addresses the following issues:
MB_ERR_INVALID_CHARS flag.std::wstring c'tor already throws exceptions in case of failure. It would feel unnatural to not throw exceptions for other errors.)NUL characters. This is rarely used, but when it is (e.g. when composing the OPENFILENAME's lpstrFilter member), it won't (silently) fail for that reason.-1 in a call to MultiByteToWideChar, the returned length does include space for the zero terminator. This character, however, is owned by the std::string implementation, and not part of the character sequence to be converted.NUL characters at the end of the string, when the c_str() member is invoked.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With