Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Protocol buffers and UTF-8

The history of Encoding Schemes / multiple Operating Systems and Endian-nes have led to a mess in terms of encoding all forms of string data (--i.e., all alphabets); for this reason protocol buffers only deals with ASCII or UTF-8 in its string types, and I can't see any polymorphic overloads that accept the C++ wstring. The question then is how is one expected to get a UTF-16 string into a protocol buffer ?

Presumably I need to keep the data as a wstring in my application code and then perform a UTF-8 conversion before I stuff it into (or extract from) the message. What is the simplest - Windows/Linux portable way to do this (A single function call from a well-supported library would make my day) ?

Data will originate from various web-servers (Linux and windows) and will eventually ends up in SQL Server (and possibly other end points).

-- edit 1--

Mark Wilkins suggestion seems to fit the bill, perhaps someone who has experience with the library can post a code snippet -- from wstring to UTF-8 -- so that I can gauge how easy it will be.

-- edit 2 --

sth's suggestion even more so. I will investigate boost serialization further.

like image 539
Hassan Syed Avatar asked Jan 19 '26 06:01

Hassan Syed


1 Answers

The Boost Serialization library contains a UTF-8 codecvt facet that you can use to convert unicode to UTF-8 and back. There even is an example in the documentation doing exactly that.

like image 125
sth Avatar answered Jan 20 '26 21:01

sth