Cython: when should I define a string as char*, str, or bytes?

Tags:

When defining a variable type that will hold a string in Cython + Python 3, I can use (at least):

cdef char* mystring = "foo"
cdef str mystring = "foo"
cdef bytes mystring = "foo"

The documentation page on strings is unclear on this -- it mostly gives examples using char* and bytes, and frankly I'm having a lot of difficulty understanding it.

In my case the strings will be coming from a Python3 program and are assumed to be unicode. They will be used as dict keys and function arguments, but I will do no further manipulation on them. Needless to say I am trying to maximize speed.

This question suggests that under Python2.7 and without Unicode, typing as str makes string manipulation code run SLOWER than with no typing at all. (But that's not necessarily relevant here since I won't be doing much string manipulation.)

What are the advantages and disadvantages of each of these options?

387

asked Mar 31 '18 06:03

right2clicky

1 Answers

If there is no further processing done on a particular type, it would be best and fastest to not type them at all, which means they are treated as a general purpose PyObject *.

The str type is a special case which means bytes on Python 2 and unicode on Python 3.

The str type is special in that it is the byte string in Python 2 and the Unicode string in Python 3

So code that types a string as str and handles it as unicode will break on python 2 where str means bytes.

Strings only need to be typed if they are to be converted to C char* or C++ std::string. There, you would use str to handle py2/py3 compatibility, along with helper functions to convert to/from bytes and unicode in order to be able to convert to either char* or std::string.

Typing of strings is for interoperability with C/C++, not for speed as such. Cython will auto-convert, without copying, a bytes string to a char* for example when it sees something like cdef char* c_string = b_string[:b_len] where b_string is a bytes type.

OTOH, if strings are typed without that type being used, Cython will do a conversion from object to bytes/unicode when it does not need to which leads to overhead.

This can be seen in the C code generated as Pyx_PyObject_AsString, Pyx_PyUnicode_FromString et al.

This is also true in general - the rule of thumb is if a specific type is not needed for further processing/conversion, best not to type it at all. Everything in python is an object so typing will convert from the general purpose PyObject* to something more specific.

answered Nov 05 '22 05:11

danny

Related questions
                            
                                Django-filter | Boolean fields
                            
                                Python 3: How to upload a pandas dataframe as a csv stream without saving on disc?
                            
                                scrapy: Middleware/Pipeline single instance
                            
                                Can't scrape YouTube video's closed captions
                            
                                python argparse.FileType('w') check extension
                            
                                PySpark: How to judge column type of dataframe
                            
                                How do I get the next page of data from a instagram tag look up
                            
                                Why doesn't Visual Studio Code recognize my WSL bash python?
                            
                                How do I write to stdin (returned from exec_command) in paramiko?
                            
                                Efficient ways of implementing waiting till a certain criterion is met in Airflow
                            
                                TensorFlow performance bottleneck on IteratorGetNext
                            
                                Video editing with python : adding a background music to a video with sound
                            
                                How do I extract value of XML attribute in Python?
                            
                                How to solve 404 error of jupyter lab
                            
                                Boolean numpy arrays with Cython
                            
                                Pybind11: Create and return numpy array from C++ side
                            
                                Python closures with generator
                            
                                flask-migrate cannot drop table because other objects depend on it
                            
                                Variable-length replacement with `re.sub()`
                            
                                Tor failing to run with Failed to bind one of the listener ports

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Cython: when should I define a string as char*, str, or bytes?

Tags:

python

string

python-3.x

cython

right2clicky

People also ask

1 Answers

danny

Recent Activity

Donate For Us