I'm trying to demonstrate to our group the virtues of Cython for enhancing Python performance. I have shown several benchmarks, all that attain speed up by just:
However, much of our code does string manipulation, and I have not been able to come up with good examples of optimizing code by typing Python strings.
An example I've tried is:
cdef str a
cdef int i,j
for j in range(1000000):
a = str([chr(i) for i in range(127)])
but typing 'a' as a string actually makes the code run slower. I've read the documentation on 'Unicode and passing strings', but am confused about how it applies in the case I've shown. We don't use Unicode--everything is pure ASCII. We're using Python 2.7.2
Any advice is appreciated.
I suggest you do your operations on cpython.array.array
s. The best documentation is the C API and the Cython source (see here).
from cpython cimport array
def cfuncA():
cdef str a
cdef int i,j
for j in range(1000):
a = ''.join([chr(i) for i in range(127)])
def cfuncB():
cdef:
str a
array.array[char] arr, template = array.array('c')
int i, j
for j in range(1000):
arr = array.clone(template, 127, False)
for i in range(127):
arr[i] = i
a = arr.tostring()
Note that the operations required vary very much on what you do to your strings.
>>> python2 -m timeit -s "import pyximport; pyximport.install(); import cyytn" "cyytn.cfuncA()"
100 loops, best of 3: 14.3 msec per loop
>>> python2 -m timeit -s "import pyximport; pyximport.install(); import cyytn" "cyytn.cfuncB()"
1000 loops, best of 3: 512 usec per loop
So that's a 30x speed-up in this case.
Also, FWIW, you can take off another fair few µs by replacing arr.tostring()
with arr.data.as_chars[:len(arr)]
and typing a
as bytes
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With