Protocol buffers and UTF-8

Question

The history of Encoding Schemes / multiple Operating Systems and Endian-nes have led to a mess in terms of encoding all forms of string data (--i.e., all alphabets); for this reason protocol buffers only deals with ASCII or UTF-8 in its string types, and I can't see any polymorphic overloads that accept the C++ wstring. The question then is how is one expected to get a UTF-16 string into a protocol buffer ?

Presumably I need to keep the data as a wstring in my application code and then perform a UTF-8 conversion before I stuff it into (or extract from) the message. What is the simplest - Windows/Linux portable way to do this (A single function call from a well-supported library would make my day) ?

Data will originate from various web-servers (Linux and windows) and will eventually ends up in SQL Server (and possibly other end points).

-- edit 1--

Mark Wilkins suggestion seems to fit the bill, perhaps someone who has experience with the library can post a code snippet -- from wstring to UTF-8 -- so that I can gauge how easy it will be.

-- edit 2 --

sth's suggestion even more so. I will investigate boost serialization further.

sth · Accepted Answer

The Boost Serialization library contains a UTF-8 codecvt facet that you can use to convert unicode to UTF-8 and back. There even is an example in the documentation doing exactly that.

Protocol buffers and UTF-8

Tags:

c++

unicode

utf-8

portability

protocol-buffers

Hassan Syed

1 Answers

sth

Recent Activity

Donate For Us

Protocol buffers and UTF-8

Tags:

c++

unicode

utf-8

portability

protocol-buffers

Hassan Syed

1 Answers

sth

Related questions

Recent Activity

Donate For Us