Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clean Way to Convert Python 3 Unicode to std::string

Tags:

c++

python-3.x

I wrap a lot of C++ using the Python 2 API (I can't use things like swig or boost.python for various technical reasons). When I have to pass a string (usually a path, always ASCII) into C/C++, I use something like this:

std::string file_name = PyString_AsString(py_file_name); 
if (PyErr_Occurred()) return NULL; 

Now I'm considering updating to Python 3, where PyString_* methods don't exist. I found one solution that says I should do something like this:

PyObject* bytes = PyUnicode_AsUTF8String(py_file_name);
std::string file_name = PyBytes_AsString(bytes); 
if (PyErr_Occurred()) return NULL; 
Py_DECREF(bytes); 

However this is twice as many lines and seems a bit ugly (not to mention that it could introduce a memory leak if I forget the last line).

The other option is to redefine the python functions to operate on bytes objects, and to call them like this

def some_function(path_name):
    _some_function(path_name.encode('utf8'))

This isn't terrible, but it does require a python-side wrapper for each function.

Is there some cleaner way to deal with this?

like image 257
Shep Avatar asked Jul 07 '13 19:07

Shep


3 Answers

Looks like the solution exists in python 3.3, with char* PyUnicode_AsUTF8(PyObject* unicode). This should be exactly the same behavior as the PyString_AsString() function from python 2.

like image 193
Shep Avatar answered Oct 20 '22 18:10

Shep


If you know (and of course, you could check with an assert or similar) that it's all ASCII, then you could simply create it like this:

std::string py_string_to_std_string(PyUnicode_string py_file_name)
{
    len = length of py_file_name;     // Not sure how you write that in python. 
    std::string str(len); 
    for(int i = 0; i < len; i++)
        str += py_file_name[i]; 
    return str;
}
like image 1
Mats Petersson Avatar answered Oct 20 '22 19:10

Mats Petersson


Providing an improved version of accepted answer, instead of using PyUnicode_AsUTF8(...) better to use PyUnicode_AsUTF8AndSize(...).

Becasue string may contain null character (0 codepoint) somewhere in the middle, then your resulting std::string will contain truncated version of full string if you use PyUnicode_AsUTF8(...).

Py_ssize_t size = 0;
char const * pc = PyUnicode_AsUTF8AndSize(obj, &size);
std::string s;
if (pc)
    s = std::string(pc, size);
else
    // Error, handle!
like image 1
Arty Avatar answered Oct 20 '22 18:10

Arty