C++ unicode questions

Tags:

I'm aware of ICU and small libraries like the utf8 one on code project (forget the exact name) however none of these are exactly what I want.

What I really want is something like ICU but wrapped up in a more friendly manner.

Specifically:

Fully Object Orientated
Implementations of the c++ standard streams, or at least something that performs the same role.
Can format time, dates etc in a locale dependent manner (eg dd/mm/yy in the UK and mm/dd/yy in the US).
Lets me choose the "internal" encoding of strings, so I can for example make it use UTF-16 on windows to avoid lots of conversions when passing strings to and from the windows API and DirectX
Easy converting of strings between encodings

If no such library exists, is it possible to wrap the ICU up using the standard c++ classes, so I can for example create a ustring which has identical usage to std::string and std::wstring, and also implement versions of the streams (optimally with them being fully compatible with the existing ones, ie I could pass it to a function expecting an std::ostream and it will perform conversion between its internal format and ascii (or utf-8) on the fly)? Assuming it is possible just how much work would it be?

EDIT: Also having looked at the c++0x standard and noticed literals for utf8, utf16 and utf32, does that mean that standard library (eg strings, streams, etc) will fully support those encodeings and the conversion between them? If so anyone got any idea how long it will be until Visual Studio will support those features?

EDIT2: As for using the existing c++ support, I'll look up the locale and facet stuff.

One of the problems I ran into is that when using streams defined around wchar_t which is 2 bytes under windows for file i/o however is it still seemed to use ascii for the files them selves.

Click to copy

std::wofstream file(L"myfile.txt", std::ios::out);
file << L"Hello World!" << std::endl;

resulted in the following hex in the file
48 65 6C 6C 6F 20 57 6F 72 6C 64 0D 0A
which is clearly ascii rather than the expected utf-16 output of:
FF FE 48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F 00 72 00 6C 00 64 00 0D 00 0A 00

972

asked May 07 '09 15:05

Fire Lancer

1 Answers

What I really want is something like ICU but wrapped up in a more friendly manner

Unfortunatly, there is no such thing. Their API is not SO terrible, so you can get used to it for some effort.

Can format time, dates etc in a locale dependent manner (eg dd/mm/yy in the UK and mm/dd/yy in the US).

There is a full support of it in std::locale class, read on how to use it. You can also specify locale for std::iostream so it would format numers, dates correctly.

Easy converting of strings between encodings

std::locale provides facets for coverting 8bits local encoding to wide one and back.

so I can for example make it use UTF-16

ICU uses utf-16 internally, win32 wchar_t and wstring use utf-16 as well, under other OSes most of implementations give wchar_t as utf-32 and wstring uses utf-32.

Remarks: Support of std::locale is not perfect, but it already gives many tools that are useful for charrecter manipulations.

See: http://www.cplusplus.com/reference/std/locale/

172

answered Sep 28 '22 18:09

Artyom

Related questions
                            
                                C++ primer 5th ed. function template overloading
                            
                                Thread local real usage of the underlying segment registers
                            
                                Using typename in C++20 requires / concept?
                            
                                gcc accepts and clang rejects this code with nested generic lambdas, why?
                            
                                SFINAE does not work for copy constructors
                            
                                Calculating std::hash using different compilers
                            
                                Should I write noexcept only to constructors and move operators?
                            
                                Is there a solution to asynchronously wait for C++ data on Flutter's invokeMethod?
                            
                                Why can views::reverse transform a non-sized_range into a size_range?
                            
                                C++ why can't classes with const reference member variables be created using constexpr?
                            
                                MSVC behaves different about default constructor of closure type in C++20
                            
                                Is there a strchr with boundary?
                            
                                Can not-copyable class be caught by value in C++?
                            
                                Range-For loop over a string adding a null or empty char at the end
                            
                                Boost Graph Library: Is there a neat algorithm built into BGL for community detection?
                            
                                Will the c++ compiler optimize away unused return value?
                            
                                Using elements of a constant array as cases in a switch statement
                            
                                Vim indentation for c++ templates?
                            
                                Windows Codepage Interactions with Standard C/C++ filenames?
                            
                                In Visual Studio can i plot my variable in breakpoint ?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C++ unicode questions

Tags:

c++

unicode

wofstream

Fire Lancer

People also ask

1 Answers

Artyom

Recent Activity

Donate For Us