Few weeks ago I asked a question on increasing the speed of a function written in Python. At that time, TryPyPy brought to my attention the possibility of using Cython for doing so. He also kindly gave an example of how I could Cythonize that code snippet. I want to do the same with the code below to see how fast I can make it by declaring variable types. I have a couple of questions related to that. I have seen the Tutorial on the cython.org, but I still have some questions. They are closely related:
double
in Cython for float
in Python. What do I do for lists? In general, where do I find the corresponding C type for a given Python type. Any example of how I could Cythonize the code below would be really helpful. I have inserted comments in the code that give information about the variable type.
class Some_class(object):
** Other attributes and functions **
def update_awareness_status(self, this_var, timePd):
'''Inputs: this_var (type: float)
timePd (type: int)
Output: None'''
max_number = len(self.possibilities)
# self.possibilities is a list of tuples.
# Each tuple is a pair of person objects.
k = int(math.ceil(0.3 * max_number))
actual_number = random.choice(range(k))
chosen_possibilities = random.sample(self.possibilities,
actual_number)
if len(chosen_possibilities) > 0:
# chosen_possibilities is a list of tuples, each tuple is a pair
# of person objects. I have included the code for the Person class
# below.
for p1,p2 in chosen_possibilities:
# awareness_status is a tuple (float, int)
if p1.awareness_status[1] < p2.awareness_status[1]:
if p1.value > p2.awareness_status[0]:
p1.awareness_status = (this_var, timePd)
else:
p1.awareness_status = p2.awareness_status
elif p1.awareness_status[1] > p2.awareness_status[1]:
if p2.value > p1.awareness_status[0]:
p2.awareness_status = (price, timePd)
else:
p2.awareness_status = p1.awareness_status
else:
pass
class Person(object):
def __init__(self,id, value):
self.value = value
self.id = id
self.max_val = 50000
## Initial awareness status.
self.awarenessStatus = (self.max_val, -1)
Compile PythonMachine code that you create when you compile code is always going to run faster than interpreted bytecode. You'll find several Python compilers available including Numpa, Nuitka, pypi and Cython.
Cython is both a module and a language that Pythoneers use to speed up their code.
You can still write regular code in Python, but to speed things up at run time Cython allows you to replace some pieces of the Python code with C. So, you end up mixing both languages together in a single file. Note that you can imagine that everything in Python is valid in Cython, but with some limitations.
As a general note, you can see exactly what C code Cython generates for every source line by running the cython
command with the -a
"annotate" option. See the Cython documentation for examples. This is extremely helpful when trying to find bottlenecks in a function's body.
Also, there's the concept of "early binding for speed" when Cython-ing your code. A Python object (like instances of your Person
class below) use general Python code for attribute access, which is slow when in an inner loop. I suspect that if you change the Person
class to a cdef class
, then you will see some speedup. Also, you need to type the p1
and p2
objects in the inner loop.
Since your code has lots of Python calls (random.sample
for example), you likely won't get huge speedups unless you find a way to put those lines into C, which takes a good amount of effort.
You can type things as a tuple
or a list
, but it doesn't often mean much of a speedup. Better to use C arrays when possible; something you'll have to look up.
I get a factor of 1.6 speedup with the trivial modifications below. Note that I had to change some things here and there to get it to compile.
ctypedef int ITYPE_t
cdef class CyPerson:
# These attributes are placed in the extension type's C-struct, so C-level
# access is _much_ faster.
cdef ITYPE_t value, id, max_val
cdef tuple awareness_status
def __init__(self, ITYPE_t id, ITYPE_t value):
# The __init__ function is much the same as before.
self.value = value
self.id = id
self.max_val = 50000
## Initial awareness status.
self.awareness_status = (self.max_val, -1)
NPERSONS = 10000
import math
import random
class Some_class(object):
def __init__(self):
ri = lambda: random.randint(0, 10)
self.possibilities = [(CyPerson(ri(), ri()), CyPerson(ri(), ri())) for i in range(NPERSONS)]
def update_awareness_status(self, this_var, timePd):
'''Inputs: this_var (type: float)
timePd (type: int)
Output: None'''
cdef CyPerson p1, p2
price = 10
max_number = len(self.possibilities)
# self.possibilities is a list of tuples.
# Each tuple is a pair of person objects.
k = int(math.ceil(0.3 * max_number))
actual_number = random.choice(range(k))
chosen_possibilities = random.sample(self.possibilities,
actual_number)
if len(chosen_possibilities) > 0:
# chosen_possibilities is a list of tuples, each tuple is a pair
# of person objects. I have included the code for the Person class
# below.
for persons in chosen_possibilities:
p1, p2 = persons
# awareness_status is a tuple (float, int)
if p1.awareness_status[1] < p2.awareness_status[1]:
if p1.value > p2.awareness_status[0]:
p1.awareness_status = (this_var, timePd)
else:
p1.awareness_status = p2.awareness_status
elif p1.awareness_status[1] > p2.awareness_status[1]:
if p2.value > p1.awareness_status[0]:
p2.awareness_status = (price, timePd)
else:
p2.awareness_status = p1.awareness_status
C does not directly know the concept of lists.
The basic data types are int
(char
, short
, long
), float
/double
(all of which have pretty straightforward mappings to python) and pointers.
If the concept of pointers is new to you, have a look at: Wikipedia:Pointers
Pointers can then be used as tuple/array replacements in some cases. Pointers of chars are the base for all strings.
Say you have an array of integers, you would then store it in as a continuous chunk of memory with a start address, you define the type (int
) and that it’s a pointer (*
):
cdef int * array;
Now you can access each element of the array like this:
array[0] = 1
However, memory has to be allocated (e.g. using malloc
) and advanced indexing will not work (e.g. array[-1]
will be random data in memory, this also hold for indexes exceeding the width of the reserved space).
More complex types don't directly map to C, but often there is a C way to do something that might not require the python types (e.g. a for loop does not need a range array/iterator).
As you noticed yourself, writing good cython code requires more detailed knowledge of C, so heading forward to a tutorial is probably the best next step.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With