What is the best type, in C++, for storing UTF-8 string? I'd like to avoid rolling my own class if possible.
My original thought was std::string
— however, this uses char
as the underlying type. char
may be unsigned or signed — it varies. On my system, it's signed. UTF-8 code units, however, are unsigned octets. This seems to indicate that it's the wrong type.
This leads us to std::basic_string<unsigned char>
- which seems to fit the bill: unsigned, 8-bit (or larger) chars.
However, most things seem to use char
. glib, for example, uses char
. C++'s ostream
's use char
.
Thoughts?
I'd just use std::string, as it is consistent with the UTF-8 ideal of treating data just as you would null-terminated ASCII strings unless you actually need their unicode-ness.
I also like GTKmm's Glib::ustring, but that only works if you're writing a GTKmm (or at least Glibmm) application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With