Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an easy way to write UTF-8 octets in Visual Studio?

I have a problem, I need to use UTF-8 encoded strings on standard char types in C++ source code like so:

char* twochars = "\xe6\x97\xa5\xd1\x88";

Normally, if I want to write an UTF-8 character I need to use octets like above. Is there something in Visual Studio (I'm using VS 2013 Ultimate) that could allow me to just write for example "ĄĘĆŻ" and automagically converted each character to multiple UTF-8 octets like in the example above? Or should I use const wchar_t*and find a lib that could convert wide strings to UTF-8 encoded standard char strings?

If there is no such thing, could you suggest any external software for that? I really don't feel like browsing the character map for every symbol/non-latin letter.

Sorry for my English, Thanks in advance.

like image 680
Michael K. Sondej Avatar asked Nov 14 '13 20:11

Michael K. Sondej


2 Answers

You can use the still undocumented pragma directive execution_character_set("utf-8"). This way your char strings will be saved as UTF-8 in your binary. BTW, this pragma is available in Visual C++ compilers only.

#include <iostream>
#include <cstring>

#pragma execution_character_set("utf-8")

using namespace std;

char *five_chars = "ĄĘĆŻ!";

int _tmain(int argc, _TCHAR* argv[])
{
    cout << "This is an UTF-8 string: " << five_chars << endl;
    cout << "...it's 5 characters long" << endl;
    cout << "...but it's " << strlen(five_chars) << " bytes long" << endl;
    return 0;
}
like image 180
Jigsore Avatar answered Oct 29 '22 08:10

Jigsore


There's no way to write the string literal directly in UTF-8 with the current versions of VC++. A future version should have UTF-8 string literals.

I tried pasting non-ASCII text directly into a string literal in a source file and saved the file as UTF-8. Looking at the source file in a hex editor confirmed that it's saved as UTF-8, but that still doesn't do what you want. At compile time, those bytes are either mapped to a character in the current code page or you get a warning.

So the most portable way to create a string literal right now is to explicitly write the octets as you've been doing.

If you want to do a run-time conversion, there are a couple options.

  1. The Windows API has WideCharToMultiByte, which can take a text as UTF-16 and convert it to multibyte encodings like UTF-8.
  2. If you're using a new enough version of the compiler and the C++ runtime, you can use std::codecvt to transform your wide character string into UTF-8.

You could use one of these techniques to write a little utility that does the conversion and outputs them as the explicit octets you would need for a string literal. You could then copy and paste the output into your source code.

like image 32
Adrian McCarthy Avatar answered Oct 29 '22 08:10

Adrian McCarthy