I've got this code:
string test("żaba");
cout << "Word: " << test << endl;
cout << "Length: " << test.size() << endl;
cout << "Letter: " << test.at(0) << endl;
The output is strange:
Word: żaba
Length: 5
Letter: �
As you can see, length should be 4 and letter: "ż".
How can I correct this code to work properly?
std::string
on non-Windows is usually used to store UTF8 strings (being the default encoding on most sane operating systems this side of 2010), but it is a "dumb" container that in the sense that it doesn't know or care anything about the bytes you're storing. It'll work for reading, storing, and writing; but not for string manipulation.
You need to use the excellent and well-maintained IBM ICU: International Components for Unicode. It's a C/C++ library for *nix or Windows into which a ton of research has gone to provide a culture-aware string library, including case-insensitive string comparison that's both fast and accurate.
Another good project that's easier to switch to for C++ devs is UTF8-CPP
Your question fails to mention encodings so I’m going to take a stab in the dark and say that this is the reason.
First course of action: read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
After that, it should become clear that such a thing as a “naked string” doesn’t exist – every string is encoded somehow. In your case, it looks very much like you are using a UTF-8-encoded string with diacritics, in which case, yes, the length of the string is (correctly) reported as 51, and the first code point might not be printable on your platform.
1) Note that string::size
counts bytes (= char
s), not logical characters or even code points.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With