Arguments for and against supporting std::wstring exclusively in cross-platform library

Question

I'm currently developing a cross-platform C++ library which I intend to be Unicode aware. I currently have compile-time support for either std::string or std::wstring via typedefs and macros. The disadvantage with this approach is that it forces you to use macros like L("string") and to make heavy use of templates based on character type.

What are the arguments for and against to support std::wstring only?

Would using std::wstring exclusively hinder the GNU/Linux user base, where UTF-8 encoding is preferred?

What are the arguments for and against to support std::wstring only?

Would using std::wstring exclusively hinder the GNU/Linux user base, where UTF-8 encoding is preferred?

David Feurle · Accepted Answer

A lot of people would want to use unicode with UTF-8 (std::string) and not UCS-2 (std::wstring). UTF-8 is the standard encoding on a lot of linux distributions and databases - so not supporting it would be a huge disadvantage. On Linux every call to a function in your library with a string as argument would require the user to convert a (native) UTF-8 string to std::wstring.

On gcc/linux each character of a std::wstring will have 4 bytes while it will have 2 bytes on Windows. This can lead to strange effects when reading or writing files (and copying them from/to different platforms). I would rather recomend UTF-8/std::string for a cross platform project.

sbi · Answer

What are the arguments for and against to support std::wstring only?

The argument in favor of using wide characters is that it can do everything narrow characters can and more.

The argument against it that I know are:

wide characters need more space (which is hardly relevant, the Chinese do not, in principle, have more headaches over memory than Americans have)
using wide characters gives headaches to some westerners who are used for all their characters to fit into 7bit (and are unwilling to learn to pay a bit of attention to not to intermingle uses of the character type for actual characters vs. other uses)

As for being flexible: I have maintained a library (several kLoC) that could deal with both narrow and wide characters. Most of it was through the character type being a template parameter, I don't remember any macros (other than UNICODE, that is). Not all of it was flexible, though, there was some code in there which ultimately required either char or wchar_t string. (No point in making internal key strings wide using wide characters.)
Users could decide whether they wanted only narrow character support (in which case "string" was fine) or only wide character support (which required them to use L"string") or whether they wanted to support both, too (which required something like T("string")).

Michael Kristofik · Answer

For:

Joel Spolsky wrote The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets. If you scroll to the bottom, you'll find that his crew uses wide character strings exclusively. If it's good enough for them, it's good enough for you. ;-)

Against:

You might have to interface with code that isn't i18n-aware. But like any good library writer, you'll just hide that mess behind an easy-to-use interface, right? Right?

Matthieu M. · Answer

I would say that using std::string or std::wstring is irrelevant.

None offer proper Unicode support anyway.

If you need internationalization, then you need proper Unicode support and should start investigating about libraries such as ICU.

After that, it's a matter of which encoding use, and this depends on the platform you're on: wrap the OS-dependent facilities behind an abstraction layer and convert in the implementation layer when applicable.

Don't worry about the encoding internally used by the Unicode library you use (or build ? hum), it's a matter of performance and should not impact the use of the library itself.

Arguments for and against supporting std::wstring exclusively in cross-platform library

Tags:

c++

unicode

cross-platform

wstring

Oskar N.

4 Answers

David Feurle

sbi

Michael Kristofik

Matthieu M.

Recent Activity

Donate For Us

Arguments for and against supporting std::wstring exclusively in cross-platform library

Tags:

c++

unicode

cross-platform

wstring

Oskar N.

4 Answers

David Feurle

sbi

Michael Kristofik

Matthieu M.

Related questions

Recent Activity

Donate For Us