Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to extract a unicode string with boost.python

It seems that the code will crash when I do extract<const char*>("a unicode string")

Anyone know how to solve this?

like image 573
yelo Avatar asked Jul 08 '11 09:07

yelo


2 Answers

This compiles and works for me, with your example string and using Python 2.x:

void process_unicode(boost::python::object u) {
  using namespace boost::python;
  const char* value = extract<const char*>(str(u).encode("utf-8"));
  std::cout << "The string value is '"<< value << "'" << std::endl;
}

You can write a specific from-python converter, if you wish to auto-convert PyUnicode (@Python2.x) to const wchar_t* or to a type from ICU (that seems to be the common recommendation for dealing with Unicode on C++).

If you want full support to unicode characters which are not in the ASCII range (for example, accented characters such as á, ç or ï, you will need to write the from-python converter. Note this will have to be done separately for Python 2.x and 3.x, if you wish to support both. For Python 3.x, the PyUnicode type was deprecated and now the string type works as PyUnicode used to for Python 2.x. Nothing that a couple of #if PY_VERSION_HEX >= 0x03000000 cannot handle.

[edit]

The above comment was wrong. Note that, since Python 3.x treats unicode strings as normal strings, boost::python will wrap that into boost::python::str objects. I have not verified how those are handled w.r.t. unicode translation in this case.

like image 66
André Anjos Avatar answered Oct 07 '22 15:10

André Anjos


Have you tried

extract<std::string>("a unicode string").c_str() 

or

extract<wchar_t*>(...)
like image 1
edvaldig Avatar answered Oct 07 '22 15:10

edvaldig