Encode/Decode std::string to UTF-16

Tags:

I have to handle a file format (both read from and write to it) in which strings are encoded in UTF-16 (2 bytes per character). Since characters out of the ASCII table are rarely used in the application domain, all of the strings in my C++ model classes are stored in instances of std::string (UTF-8 encoded).

I'm looking for a library (searched in STL and Boost with no luck) or a set of C/C++ functions to handle this std::string <-> UTF-16 conversion when loading from or saving to file format (actually modeled as a bytestream) including the generation/recognition of surrogate pairs and all that Unicode stuffs (I'm admittedly no expert with)...

Any suggestions? Thanks!

EDIT: forgot to mention it should be cross-platform (Win / Mac) and cannot use C++11.

508

asked Jun 18 '12 15:06

Peter

2 Answers

C++11 has this functionality:

std::string s = u8"Hello, World!";

// #include <codecvt>
std::wstring_convert<std::codecvt<char16_t,char,std::mbstate_t>,char16_t> convert;

std::u16string u16 = convert.from_bytes(s);
std::string u8 = convert.to_bytes(u16);

However to my knowledge the only implementation that has this so far is libc++. C++11 also has std::codecvt_utf8_utf16<char16_t> which some other implementations have. Specifically, codecvt_utf8_utf16 works in VS 2010 and above, and since wchar_t is used by Windows to represent UTF-16 you can use this to convert between UTF-8 and Windows' native encoding.

The specialization codecvt<char16_t, char, mbstate_t> converts between the UTF-16 and UTF-8 encoding schemes, and the specialization codecvt<char32_t, char, mbstate_t> converts between the UTF-32 and UTF-8 encoding schemes.

— [locale.codecvt] 22.4.1.4/3

Oh, and std::codecvt specializations have protected destructors, and wstring_convert requires access to the destructor so you really need an adapter:

template <class Facet>
class usable_facet : public Facet {
public:
    using Facet::Facet; // inherit constructors
    ~usable_facet() {}

    // workaround for compilers without inheriting constructors:
    // template <class ...Args> usable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
};

template<typename internT, typename externT, typename stateT> 
using codecvt = usable_facet<std::codecvt<internT, externT, stateT>>;

std::wstring_convert<codecvt<char16_t,char,std::mbstate_t>> convert;

171

answered Sep 26 '22 04:09

bames53

Did you look at Boost.Locale? This page, in particular, describes how to do UTF to UTF conversions and how to integrate it with IOStreams.

answered Sep 22 '22 04:09

thehouse

Related questions
                            
                                What is the difference between Type** name, and Type* name[]?
                            
                                Getting pixel color with Magick++?
                            
                                Python equivalent of vector::reserve()
                            
                                C++ writing an object to a file then later reading it in? [duplicate]
                            
                                Why does operator delete's signature take two parameters?
                            
                                How to use boost::optional<T> to return NULL in C++?
                            
                                performance implications of deep inheritance tree in c++
                            
                                Using new operator for copy an object to heap without knowing its type
                            
                                converting -1 to unsigned types
                            
                                OpenGL ES - glReadPixels
                            
                                Pass std algos predicates by reference in C++
                            
                                Qt application cancel exit event
                            
                                #error gl.h included before glew.h
                            
                                Behavior of vector's reserve( ) method
                            
                                How to check if an opencv window is closed
                            
                                Creating window in another thread(not main thread)
                            
                                Is empty enum ( enum{}; ) portable?
                            
                                Two way C++ to C# communication using named pipes
                            
                                How do I limit the number of characters entered via cin?
                            
                                Changing the cursor in a QGraphicsView

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Encode/Decode std::string to UTF-16

Tags:

c++

stdstring

utf-16

Peter

People also ask

2 Answers

bames53

thehouse

Recent Activity

Donate For Us