I have a problem, I need to use UTF-8 encoded strings on standard char types in C++ source code like so:
char* twochars = "\xe6\x97\xa5\xd1\x88";
Normally, if I want to write an UTF-8 character I need to use octets like above. Is there something in Visual Studio (I'm using VS 2013 Ultimate) that could allow me to just write for example "ĄĘĆŻ" and automagically converted each character to multiple UTF-8 octets like in the example above? Or should I use const wchar_t*
and find a lib that could convert wide strings to UTF-8 encoded standard char strings?
If there is no such thing, could you suggest any external software for that? I really don't feel like browsing the character map for every symbol/non-latin letter.
Sorry for my English, Thanks in advance.
You can use the still undocumented pragma directive execution_character_set("utf-8")
. This way your char
strings will be saved as UTF-8 in your binary. BTW, this pragma is available in Visual C++ compilers only.
#include <iostream>
#include <cstring>
#pragma execution_character_set("utf-8")
using namespace std;
char *five_chars = "ĄĘĆŻ!";
int _tmain(int argc, _TCHAR* argv[])
{
cout << "This is an UTF-8 string: " << five_chars << endl;
cout << "...it's 5 characters long" << endl;
cout << "...but it's " << strlen(five_chars) << " bytes long" << endl;
return 0;
}
There's no way to write the string literal directly in UTF-8 with the current versions of VC++. A future version should have UTF-8 string literals.
I tried pasting non-ASCII text directly into a string literal in a source file and saved the file as UTF-8. Looking at the source file in a hex editor confirmed that it's saved as UTF-8, but that still doesn't do what you want. At compile time, those bytes are either mapped to a character in the current code page or you get a warning.
So the most portable way to create a string literal right now is to explicitly write the octets as you've been doing.
If you want to do a run-time conversion, there are a couple options.
std::codecvt
to transform your wide character string into UTF-8.You could use one of these techniques to write a little utility that does the conversion and outputs them as the explicit octets you would need for a string literal. You could then copy and paste the output into your source code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With