Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting unicode strings and vice-versa

Tags:

c++

unicode

I'm kind of new to using Unicode string and pointers and I've no idea how the conversion to unicode to ascii and versa-versa works. Following is what I'm trying to do,

const wchar_t *p = L"This is a string";

If I wanted to convert it to char*, how would the conversion work with converting wchar_t* to char* and vice-versa?

or by value using wstring to string class object and vice-versa

std::wstring wstr = L"This is a string";

If i'm correct, can you just copy the string to a new buffer without conversion?

like image 313
user963241 Avatar asked Jan 24 '11 19:01

user963241


People also ask

Can we convert Unicode to text?

World's simplest unicode tool. This browser-based utility converts fancy Unicode text back to regular text. All Unicode glyphs that you paste or enter in the text area as the input automatically get converted to simple ASCII characters in the output.

How do you convert letters into Unicode?

Unicode code converter. Type or paste text in the green box and click on the Convert button above it. Alternative representations will appear in all the other boxes. You can also do the same in any grey box, if you want to target only certain types of escaped text.

What are the rules for converting a Unicode string into the ASCII encoding?

You CAN'T convert from Unicode to ASCII. Almost every character in Unicode cannot be expressed in ASCII, and those that can be expressed have exactly the same codepoints in ASCII as in UTF-8, which is probably what you have.

What is the difference between string and Unicode string?

In Python3, the default string is called Unicode string (u string), you can understand them as human-readable characters. As explained above, you can encode them to the byte string (b string), and the byte string can be decoded back to the Unicode string.


2 Answers

In the future (VS 2010 already supports it), this will be possible in standard C++ (finally!):

#include <string>
#include <locale>

std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
const std::wstring wide_string = L"This is a string";
const std::string utf8_string = converter.to_bytes(wide_string);
like image 190
Philipp Avatar answered Oct 21 '22 17:10

Philipp


The conversion from ASCII to Unicode and vice versa are quite trivial. By design, the first 128 Unicode values are the same as ASCII (in fact, the first 256 are equal to ISO-8859-1).

So the following code works on systems where char is ASCII and wchar_t is Unicode:

const char* ASCII = "Hello, world";
std::wstring Unicode(ASCII, ASCII+strlen(ASCII));

You can't reverse it this simple: 汉 does exist in Unicode but not in ASCII, so how would you "convert" it?

like image 43
MSalters Avatar answered Oct 21 '22 19:10

MSalters