The more I work with C++ locale facets, more I understand --- they are broken.
std::time_get
-- is not symmetric with std::time_put
(as it in C strftime/strptime) and does not allow easy parsing of times with AM/PM marks.ru_RU.UTF-8
).std::ctype
is very simplistic assuming that to upper/to lower can be done on per-character base (case conversion may change number of characters and it is context dependent).std::collate
-- does not support collation strength (case sensitive or insensitive).And much more...
Thanks.
EDIT: Clarifications in case the link is not accessible:
std::numpunct
defines thousands separator as char. So when separator in U+2002 -- different kind of space it can't be reproduced as single char in UTF-8 but as multiple byte sequence.
In C API struct lconv
defines thousands separator as string and does not suffers from this problem. So, when you try to format numbers with separators outside of ASCII with UTF-8 locale, invalid UTF-8 is produced.
To reproduce this bug write 1234 to std:ostream with imbued ru_RU.UTF-8
locale
EDIT2: I must admit that POSIX C localization API works much smoother:
std::time_put::put
)However it is still for from being perfecet.
EDIT3: According to the latest notes about C++0x I can see that std::time_get::get
-- similar to strptime
and opposite of std::time_put::put
.
I agree with you, C++ is lacking proper i18n support.
Does anybody knows whether any changes are expected in standard facets in C++0x?
It is too late in the game, so probably not.
Is there any way to bring an importance of such changes?
I am very pessimistic about this.
When asked directly, Stroustrup claimed that he does not see any problems with the current status. And another one of the big C++ guys (book author and all) did not even realize that wchar_t can be one byte, if you read the standard.
And some threads in boost (which seems to drive the direction in the future) show so little understanding on how this works that is outright scary.
C++0x barely added some Unicode character data types, late in the game and after a lot of struggle. I am not holding my breath for more too soon.
I guess the only chance to see something better is if someone really good/respected in the i18n and C++ worlds gets directly involved with the next version of the standard. No clue who that might be though :-(
std::numpunct
is a template. All specializations try to return the decimal seperator character. Obviously, in any locale where that is a wide character, you should use std::numpunct<wchar_t>
, as the <char
specialization can't do that.
That said, C++0x is pretty much done. However, if good improvements continue, the C++ committee is likely to start C++1x. The ISO C++ committee on is very likely to accept your help, if offered through your national ISO member organization. I see that Pavel Minaev suggested a Defect Report. That's technically possible, but the problems you describe are in general design limitations. In that case, the most reliable course of action is to design a Boost library for this, have it pass the Boost review, submit it for inclusion in the standard, and participate in the ISO C++ meetings to deal with any issues cropping up there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With