Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing strings in Cython

I'm trying to demonstrate to our group the virtues of Cython for enhancing Python performance. I have shown several benchmarks, all that attain speed up by just:

  1. Compiling the existing Python code.
  2. Using cdef to static type variables, particular in inner loops.

However, much of our code does string manipulation, and I have not been able to come up with good examples of optimizing code by typing Python strings.

An example I've tried is:

cdef str a
cdef int i,j
for j in range(1000000):
   a = str([chr(i) for i in range(127)])

but typing 'a' as a string actually makes the code run slower. I've read the documentation on 'Unicode and passing strings', but am confused about how it applies in the case I've shown. We don't use Unicode--everything is pure ASCII. We're using Python 2.7.2

Any advice is appreciated.

like image 526
Paul Nelson Avatar asked Apr 14 '14 15:04

Paul Nelson


1 Answers

I suggest you do your operations on cpython.array.arrays. The best documentation is the C API and the Cython source (see here).

from cpython cimport array

def cfuncA():
    cdef str a
    cdef int i,j
    for j in range(1000):
        a = ''.join([chr(i) for i in range(127)])

def cfuncB():
    cdef:
        str a
        array.array[char] arr, template = array.array('c')
        int i, j

    for j in range(1000):
        arr = array.clone(template, 127, False)

        for i in range(127):
            arr[i] = i

        a = arr.tostring()

Note that the operations required vary very much on what you do to your strings.

>>> python2 -m timeit -s "import pyximport; pyximport.install(); import cyytn" "cyytn.cfuncA()"
100 loops, best of 3: 14.3 msec per loop

>>> python2 -m timeit -s "import pyximport; pyximport.install(); import cyytn" "cyytn.cfuncB()"
1000 loops, best of 3: 512 usec per loop

So that's a 30x speed-up in this case.


Also, FWIW, you can take off another fair few µs by replacing arr.tostring() with arr.data.as_chars[:len(arr)] and typing a as bytes.

like image 130
Veedrac Avatar answered Sep 22 '22 03:09

Veedrac