Incorrect behaviour of size() and at() in string class

Question

I've got this code:

string test("żaba");

cout << "Word: " << test << endl;
cout << "Length: " << test.size() << endl;
cout << "Letter: " << test.at(0) << endl;

The output is strange:

Word: żaba
Length: 5
Letter: �

As you can see, length should be 4 and letter: "ż".

How can I correct this code to work properly?

Mahmoud Al-Qudsi · Accepted Answer

std::string on non-Windows is usually used to store UTF8 strings (being the default encoding on most sane operating systems this side of 2010), but it is a "dumb" container that in the sense that it doesn't know or care anything about the bytes you're storing. It'll work for reading, storing, and writing; but not for string manipulation.

You need to use the excellent and well-maintained IBM ICU: International Components for Unicode. It's a C/C++ library for *nix or Windows into which a ton of research has gone to provide a culture-aware string library, including case-insensitive string comparison that's both fast and accurate.

Another good project that's easier to switch to for C++ devs is UTF8-CPP

Konrad Rudolph · Answer

Your question fails to mention encodings so I’m going to take a stab in the dark and say that this is the reason.

First course of action: read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).

After that, it should become clear that such a thing as a “naked string” doesn’t exist – every string is encoded somehow. In your case, it looks very much like you are using a UTF-8-encoded string with diacritics, in which case, yes, the length of the string is (correctly) reported as 5¹, and the first code point might not be printable on your platform.

¹⁾ Note that string::size counts bytes (= chars), not logical characters or even code points.

Incorrect behaviour of size() and at() in string class

Tags:

c++

Daniel Gadawski

2 Answers

Mahmoud Al-Qudsi

Konrad Rudolph

Recent Activity

Donate For Us

Incorrect behaviour of size() and at() in string class

Tags:

c++

Daniel Gadawski

2 Answers

Mahmoud Al-Qudsi

Konrad Rudolph

Related questions

Recent Activity

Donate For Us