I have some python code that has many classes. I used <code>cProfile</code> to find that the total time to run the program is 68 seconds. I found that the following function in a class called <code>Buyers</code> takes about 60 seconds of those 68 seconds. I have to run the program about 100 times, so any increase in speed will help. Can you suggest ways to increase the speed by modifying the code? If you need more information that will help, please let me know. <pre class="prettyprint"><code>def qtyDemanded(self, timePd, priceVector): '''Returns quantity demanded in period timePd. In addition, also updates the list of customers and non-customers. Inputs: timePd and priceVector Output: count of people for whom priceVector[-1] < utility ''' ## Initialize count of customers to zero ## Set self.customers and self.nonCustomers to empty lists price = priceVector[-1] count = 0 self.customers = [] self.nonCustomers = [] for person in self.people: if person.utility >= price: person.customer = 1 self.customers.append(person) else: person.customer = 0 self.nonCustomers.append(person) return len(self.customers) </code></pre> <code>self.people</code> is a list of <code>person</code> objects. Each <code>person</code> has <code>customer</code> and <code>utility</code> as its attributes. EDIT - responsed added ------------------------------------- Thanks so much for the suggestions. Here is the response to some questions and suggestions people have kindly made. I have not tried them all, but will try others and write back later. (1) @amber - the function is accessed 80,000 times. (2) @gnibbler and others - self.people is a list of Person objects in memory. Not connected to a database. (3) @Hugh Bothwell cumtime taken by the original function - 60.8 s (accessed 80000 times) cumtime taken by the new function with local function aliases as suggested - 56.4 s (accessed 80000 times) (4) @rotoglup and @Martin Thomas I have not tried your solutions yet. I need to check the rest of the code to see the places where I use self.customers before I can make the change of not appending the customers to self.customers list. But I will try this and write back. (5) @TryPyPy - thanks for your kind offer to check the code. Let me first read a little on the suggestions you have made to see if those will be feasible to use. EDIT 2 Some suggested that since I am flagging the customers and noncustomers in the <code>self.people</code>, I should try without creating separate lists of <code>self.customers</code> and <code>self.noncustomers</code> using append. Instead, I should loop over the <code>self.people</code> to find the number of customers. I tried the following code and timed both functions below <code>f_w_append</code> and <code>f_wo_append</code>. I did find that the latter takes less time, but it is still 96% of the time taken by the former. That is, it is a very small increase in the speed. @TryPyPy - The following piece of code is complete enough to check the bottleneck function, in case your offer is still there to check it with other compilers. Thanks again to everyone who replied. <pre class="prettyprint"><code>import numpy class person(object): def __init__(self, util): self.utility = util self.customer = 0 class population(object): def __init__(self, numpeople): self.people = [] self.cus = [] self.noncus = [] numpy.random.seed(1) utils = numpy.random.uniform(0, 300, numpeople) for u in utils: per = person(u) self.people.append(per) popn = population(300) def f_w_append(): '''Function with append''' P = 75 cus = [] noncus = [] for per in popn.people: if per.utility >= P: per.customer = 1 cus.append(per) else: per.customer = 0 noncus.append(per) return len(cus) def f_wo_append(): '''Function without append''' P = 75 for per in popn.people: if per.utility >= P: per.customer = 1 else: per.customer = 0 numcustomers = 0 for per in popn.people: if per.customer == 1: numcustomers += 1 return numcustomers </code></pre> EDIT 3: It seems numpy is the problem This is in response to what John Machin said below. Below you see two ways of defining <code>Population</code> class. I ran the program below twice, once with each way of creating <code>Population</code> class. One uses numpy and one does not use numpy. The one without numpy takes similar time as John found in his runs. One with numpy takes much longer. What is not clear to me is that the <code>popn</code> instance is created before time recording begins (at least that is what it appears from the code). Then, why is numpy version taking longer. And, I thought numpy was supposed to be more efficient. Anyhow, the problem seems to be with numpy and not so much with the append, even though it does slow down things a little. Can someone please confirm with the code below? Thanks. <pre class="prettyprint"><code>import random # instead of numpy import numpy import time timer_func = time.time # using Mac OS X 10.5.8 class Person(object): def __init__(self, util): self.utility = util self.customer = 0 class Population(object): def __init__(self, numpeople): random.seed(1) self.people = [Person(random.uniform(0, 300)) for i in xrange(numpeople)] self.cus = [] self.noncus = [] # Numpy based # class Population(object): # def __init__(self, numpeople): # numpy.random.seed(1) # utils = numpy.random.uniform(0, 300, numpeople) # self.people = [Person(u) for u in utils] # self.cus = [] # self.noncus = [] def f_wo_append(popn): '''Function without append''' P = 75 for per in popn.people: if per.utility >= P: per.customer = 1 else: per.customer = 0 numcustomers = 0 for per in popn.people: if per.customer == 1: numcustomers += 1 return numcustomers t0 = timer_func() for i in xrange(20000): x = f_wo_append(popn) t1 = timer_func() print t1-t0 </code></pre> Edit 4: See the answers by John Machin and TryPyPy Since there have been so many edits and updates here, those who find themselves here for the first time may be a little confused. See the answers by John Machin and TryPyPy. Both of these can help in improving the speed of the code substantially. I am grateful to them and others who alerted me to slowness of <code>append</code>. Since, in this instance I am going to use John Machin's solution and not use numpy for generating utilities, I am accepting his response as an answer. However, I really appreciate the directions pointed out by TryPyPy also.

There are many things you can try after optimizing your Python code for speed. If this program doesn't need C extensions, you can run it under PyPy to benefit from its JIT compiler. You can try making a C extension for possibly huge speedups. Shed Skin will even allow you to convert your Python program to a standalone C++ binary. I'm willing to time your program under these different optimization scenarios if you can provide enough code for benchmarking, Edit: First of all, I have to agree with everyone else: are you sure you're measuring the time correctly? The example code runs 100 times in under 0.1 seconds here, so there is a good chance the either the time is wrong or you have a bottleneck (IO?) that isn't present in the code sample. That said, I made it 300000 people so times were consistent. Here's the adapted code, shared by CPython (2.5), PyPy and Shed Skin: <pre class="prettyprint"><code>from time import time import random import sys class person(object): def __init__(self, util): self.utility = util self.customer = 0 class population(object): def __init__(self, numpeople, util): self.people = [] self.cus = [] self.noncus = [] for u in util: per = person(u) self.people.append(per) def f_w_append(popn): '''Function with append''' P = 75 cus = [] noncus = [] # Help CPython a bit # cus_append, noncus_append = cus.append, noncus.append for per in popn.people: if per.utility >= P: per.customer = 1 cus.append(per) else: per.customer = 0 noncus.append(per) return len(cus) def f_wo_append(popn): '''Function without append''' P = 75 for per in popn.people: if per.utility >= P: per.customer = 1 else: per.customer = 0 numcustomers = 0 for per in popn.people: if per.customer == 1: numcustomers += 1 return numcustomers def main(): try: numpeople = int(sys.argv[1]) except: numpeople = 300000 print "Running for %s people, 100 times." % numpeople begin = time() random.seed(1) # Help CPython a bit uniform = random.uniform util = [uniform(0.0, 300.0) for _ in xrange(numpeople)] # util = [random.uniform(0.0, 300.0) for _ in xrange(numpeople)] popn1 = population(numpeople, util) start = time() for _ in xrange(100): r = f_wo_append(popn1) print r print "Without append: %s" % (time() - start) popn2 = population(numpeople, util) start = time() for _ in xrange(100): r = f_w_append(popn2) print r print "With append: %s" % (time() - start) print "\n\nTotal time: %s" % (time() - begin) if __name__ == "__main__": main() </code></pre> Running with PyPy is as simple as running with CPython, you just type 'pypy' instead of 'python'. For Shed Skin, you must convert to C++, compile and run: <pre class="prettyprint"><code>shedskin -e makefaster.py && make # Check that you're using the makefaster.so file and run test python -c "import makefaster; print makefaster.__file__; makefaster.main()" </code></pre> And here is the Cython-ized code: <pre class="prettyprint"><code>from time import time import random import sys cdef class person: cdef readonly int utility cdef public int customer def __init__(self, util): self.utility = util self.customer = 0 class population(object): def __init__(self, numpeople, util): self.people = [] self.cus = [] self.noncus = [] for u in util: per = person(u) self.people.append(per) cdef int f_w_append(popn): '''Function with append''' cdef int P = 75 cdef person per cus = [] noncus = [] # Help CPython a bit # cus_append, noncus_append = cus.append, noncus.append for per in popn.people: if per.utility >= P: per.customer = 1 cus.append(per) else: per.customer = 0 noncus.append(per) cdef int lcus = len(cus) return lcus cdef int f_wo_append(popn): '''Function without append''' cdef int P = 75 cdef person per for per in popn.people: if per.utility >= P: per.customer = 1 else: per.customer = 0 cdef int numcustomers = 0 for per in popn.people: if per.customer == 1: numcustomers += 1 return numcustomers def main(): cdef int i, r, numpeople cdef double _0, _300 _0 = 0.0 _300 = 300.0 try: numpeople = int(sys.argv[1]) except: numpeople = 300000 print "Running for %s people, 100 times." % numpeople begin = time() random.seed(1) # Help CPython a bit uniform = random.uniform util = [uniform(_0, _300) for i in xrange(numpeople)] # util = [random.uniform(0.0, 300.0) for _ in xrange(numpeople)] popn1 = population(numpeople, util) start = time() for i in xrange(100): r = f_wo_append(popn1) print r print "Without append: %s" % (time() - start) popn2 = population(numpeople, util) start = time() for i in xrange(100): r = f_w_append(popn2) print r print "With append: %s" % (time() - start) print "\n\nTotal time: %s" % (time() - begin) if __name__ == "__main__": main() </code></pre> For building it, it's nice to have a setup.py like this one: <pre class="prettyprint"><code>from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext ext_modules = [Extension("cymakefaster", ["makefaster.pyx"])] setup( name = 'Python code to speed up', cmdclass = {'build_ext': build_ext}, ext_modules = ext_modules ) </code></pre> You build it with: python setupfaster.py build_ext --inplace Then test: python -c "import cymakefaster; print cymakefaster.file; cymakefaster.main()" Timings were run five times for each version, with Cython being the fastest and easiest of the code generators to use (Shed Skin aims to be simpler, but cryptic error messages and implicit static typing made it harder here). As for best value, PyPy gives impressive speedup in the counter version with no code changes. <pre class="prettyprint"><code>#Results (time in seconds for 30000 people, 100 calls for each function): Mean Min Times CPython 2.5.2 Without append: 35.037 34.518 35.124, 36.363, 34.518, 34.620, 34.559 With append: 29.251 29.126 29.339, 29.257, 29.259, 29.126, 29.272 Total time: 69.288 68.739 69.519, 70.614, 68.746, 68.739, 68.823 PyPy 1.4.1 Without append: 2.672 2.655 2.655, 2.670, 2.676, 2.690, 2.668 With append: 13.030 12.672 12.680, 12.725, 14.319, 12.755, 12.672 Total time: 16.551 16.194 16.196, 16.229, 17.840, 16.295, 16.194 Shed Skin 0.7 (gcc -O2) Without append: 1.601 1.599 1.599, 1.605, 1.600, 1.602, 1.599 With append: 3.811 3.786 3.839, 3.795, 3.798, 3.786, 3.839 Total time: 5.704 5.677 5.715, 5.705, 5.699, 5.677, 5.726 Cython 0.14 (gcc -O2) Without append: 1.692 1.673 1.673, 1.710, 1.678, 1.688, 1.711 With append: 3.087 3.067 3.079, 3.080, 3.119, 3.090, 3.067 Total time: 5.565 5.561 5.562, 5.561, 5.567, 5.562, 5.572 </code></pre> Edit: Aaaand more meaningful timings, for 80000 calls with 300 people each: <pre class="prettyprint"><code>Results (time in seconds for 300 people, 80000 calls for each function): Mean Min Times CPython 2.5.2 Without append: 27.790 25.827 25.827, 27.315, 27.985, 28.211, 29.612 With append: 26.449 24.721 24.721, 27.017, 27.653, 25.576, 27.277 Total time: 54.243 50.550 50.550, 54.334, 55.652, 53.789, 56.892 Cython 0.14 (gcc -O2) Without append: 1.819 1.760 1.760, 1.794, 1.843, 1.827, 1.871 With append: 2.089 2.063 2.100, 2.063, 2.098, 2.104, 2.078 Total time: 3.910 3.859 3.865, 3.859, 3.944, 3.934, 3.951 PyPy 1.4.1 Without append: 0.889 0.887 0.894, 0.888, 0.890, 0.888, 0.887 With append: 1.671 1.665 1.665, 1.666, 1.671, 1.673, 1.681 Total time: 2.561 2.555 2.560, 2.555, 2.561, 2.561, 2.569 Shed Skin 0.7 (g++ -O2) Without append: 0.310 0.301 0.301, 0.308, 0.317, 0.320, 0.303 With append: 1.712 1.690 1.733, 1.700, 1.735, 1.690, 1.702 Total time: 2.027 2.008 2.035, 2.008, 2.052, 2.011, 2.029 </code></pre> Shed Skin becomes fastest, PyPy surpasses Cython. All three speed things up a lot compared to CPython.

Please consider trimming down your <code>f_wo_append</code> function: <pre class="prettyprint"><code>def f_wo_append(): '''Function without append''' P = 75 numcustomers = 0 for person in popn.people: person.customer = iscust = person.utility >= P numcustomers += iscust return numcustomers </code></pre> Edit in response to OP's comment """This made it a lot worse! The trimmed version takes 4 times more time than the version I have posted above. """ There is no way that that could take "4 times more" (5 times?) ... here is my code, which demonstrates a significant reduction in the "without append" case, as I suggested, and also introduces a significant improvement in the "with append" case. <pre class="prettyprint"><code>import random # instead of numpy import time timer_func = time.clock # better on Windows, use time.time on *x platform class Person(object): def __init__(self, util): self.utility = util self.customer = 0 class Population(object): def __init__(self, numpeople): random.seed(1) self.people = [Person(random.uniform(0, 300)) for i in xrange(numpeople)] self.cus = [] self.noncus = [] def f_w_append(popn): '''Function with append''' P = 75 cus = [] noncus = [] for per in popn.people: if per.utility >= P: per.customer = 1 cus.append(per) else: per.customer = 0 noncus.append(per) popn.cus = cus # omitted from OP's code popn.noncus = noncus # omitted from OP's code return len(cus) def f_w_append2(popn): '''Function with append''' P = 75 popn.cus = [] popn.noncus = [] cusapp = popn.cus.append noncusapp = popn.noncus.append for per in popn.people: if per.utility >= P: per.customer = 1 cusapp(per) else: per.customer = 0 noncusapp(per) return len(popn.cus) def f_wo_append(popn): '''Function without append''' P = 75 for per in popn.people: if per.utility >= P: per.customer = 1 else: per.customer = 0 numcustomers = 0 for per in popn.people: if per.customer == 1: numcustomers += 1 return numcustomers def f_wo_append2(popn): '''Function without append''' P = 75 numcustomers = 0 for person in popn.people: person.customer = iscust = person.utility >= P numcustomers += iscust return numcustomers if __name__ == "__main__": import sys popsize, which, niter = map(int, sys.argv[1:4]) pop = Population(popsize) func = (f_w_append, f_w_append2, f_wo_append, f_wo_append2)[which] t0 = timer_func() for _unused in xrange(niter): nc = func(pop) t1 = timer_func() print "popsize=%d func=%s niter=%d nc=%d seconds=%.2f" % ( popsize, func.__name__, niter, nc, t1 - t0) </code></pre> and here are the results of running it (Python 2.7.1, Windows 7 Pro, "Intel Core i3 CPU 540 @ 3.07 GHz"): <pre class="prettyprint"><code>C:\junk>\python27\python ncust.py 300 0 80000 popsize=300 func=f_w_append niter=80000 nc=218 seconds=5.48 C:\junk>\python27\python ncust.py 300 1 80000 popsize=300 func=f_w_append2 niter=80000 nc=218 seconds=4.62 C:\junk>\python27\python ncust.py 300 2 80000 popsize=300 func=f_wo_append niter=80000 nc=218 seconds=5.55 C:\junk>\python27\python ncust.py 300 3 80000 popsize=300 func=f_wo_append2 niter=80000 nc=218 seconds=4.29 </code></pre> Edit 3 Why numpy takes longer: <pre class="prettyprint"><code>>>> import numpy >>> utils = numpy.random.uniform(0, 300, 10) >>> print repr(utils[0]) 42.777972538362874 >>> type(utils[0]) <type 'numpy.float64'> </code></pre> and here's why my f_wo_append2 function took 4 times longer: <pre class="prettyprint"><code>>>> x = utils[0] >>> type(x) <type 'numpy.float64'> >>> type(x >= 75) <type 'numpy.bool_'> # iscust refers to a numpy.bool_ >>> type(0 + (x >= 75)) <type 'numpy.int32'> # numcustomers ends up referring to a numpy.int32 >>> </code></pre> The empirical evidence is that these custom types aren't so fast when used as scalars ... perhaps because they need to reset the floating-point hardware each time they are used. OK for big arrays, not for scalars. Are you using any other numpy functionality? If not, just use the <code>random</code> module. If you have other uses for numpy, you may wish to coerce the <code>numpy.float64</code> to <code>float</code> during the population setup.

Increasing speed of python code

Tags:

performance

python

I have some python code that has many classes. I used cProfile to find that the total time to run the program is 68 seconds. I found that the following function in a class called Buyers takes about 60 seconds of those 68 seconds. I have to run the program about 100 times, so any increase in speed will help. Can you suggest ways to increase the speed by modifying the code? If you need more information that will help, please let me know.

def qtyDemanded(self, timePd, priceVector):
    '''Returns quantity demanded in period timePd. In addition,
    also updates the list of customers and non-customers.

    Inputs: timePd and priceVector
    Output: count of people for whom priceVector[-1] < utility
    '''

    ## Initialize count of customers to zero
    ## Set self.customers and self.nonCustomers to empty lists
    price = priceVector[-1]
    count = 0
    self.customers = []
    self.nonCustomers = []


    for person in self.people:
        if person.utility >= price:             
            person.customer = 1
            self.customers.append(person)
        else:
            person.customer = 0
            self.nonCustomers.append(person)

    return len(self.customers)

self.people is a list of person objects. Each person has customer and utility as its attributes.

EDIT - responsed added

-------------------------------------

Thanks so much for the suggestions. Here is the response to some questions and suggestions people have kindly made. I have not tried them all, but will try others and write back later.

(1) @amber - the function is accessed 80,000 times.

(2) @gnibbler and others - self.people is a list of Person objects in memory. Not connected to a database.

(3) @Hugh Bothwell

cumtime taken by the original function - 60.8 s (accessed 80000 times)

cumtime taken by the new function with local function aliases as suggested - 56.4 s (accessed 80000 times)

(4) @rotoglup and @Martin Thomas

I have not tried your solutions yet. I need to check the rest of the code to see the places where I use self.customers before I can make the change of not appending the customers to self.customers list. But I will try this and write back.

(5) @TryPyPy - thanks for your kind offer to check the code.

Let me first read a little on the suggestions you have made to see if those will be feasible to use.

EDIT 2 Some suggested that since I am flagging the customers and noncustomers in the self.people, I should try without creating separate lists of self.customers and self.noncustomers using append. Instead, I should loop over the self.people to find the number of customers. I tried the following code and timed both functions below f_w_append and f_wo_append. I did find that the latter takes less time, but it is still 96% of the time taken by the former. That is, it is a very small increase in the speed.

@TryPyPy - The following piece of code is complete enough to check the bottleneck function, in case your offer is still there to check it with other compilers.

Thanks again to everyone who replied.

import numpy

class person(object):
    def __init__(self, util):
        self.utility = util
        self.customer = 0

class population(object):
    def __init__(self, numpeople):
        self.people = []
        self.cus = []
        self.noncus = []
        numpy.random.seed(1)
        utils = numpy.random.uniform(0, 300, numpeople)
        for u in utils:
            per = person(u)
            self.people.append(per)

popn = population(300)

def f_w_append():
    '''Function with append'''
    P = 75
    cus = []
    noncus = []
    for per in popn.people:
        if  per.utility >= P:
            per.customer = 1
            cus.append(per)
        else:
            per.customer = 0
            noncus.append(per)
    return len(cus)

def f_wo_append():
    '''Function without append'''
    P = 75
    for per in popn.people:
        if  per.utility >= P:
            per.customer = 1
        else:
            per.customer = 0

    numcustomers = 0
    for per in popn.people:
        if per.customer == 1:
            numcustomers += 1                
    return numcustomers

EDIT 3: It seems numpy is the problem

This is in response to what John Machin said below. Below you see two ways of defining Population class. I ran the program below twice, once with each way of creating Population class. One uses numpy and one does not use numpy. The one without numpy takes similar time as John found in his runs. One with numpy takes much longer. What is not clear to me is that the popn instance is created before time recording begins (at least that is what it appears from the code). Then, why is numpy version taking longer. And, I thought numpy was supposed to be more efficient. Anyhow, the problem seems to be with numpy and not so much with the append, even though it does slow down things a little. Can someone please confirm with the code below? Thanks.

import random # instead of numpy
import numpy
import time
timer_func = time.time # using Mac OS X 10.5.8

class Person(object):
    def __init__(self, util):
        self.utility = util
        self.customer = 0

class Population(object):
    def __init__(self, numpeople):
        random.seed(1)
        self.people = [Person(random.uniform(0, 300)) for i in xrange(numpeople)]
        self.cus = []
        self.noncus = []   

# Numpy based    
# class Population(object):
#     def __init__(self, numpeople):
#         numpy.random.seed(1)
#         utils = numpy.random.uniform(0, 300, numpeople)
#         self.people = [Person(u) for u in utils]
#         self.cus = []
#         self.noncus = []    


def f_wo_append(popn):
    '''Function without append'''
    P = 75
    for per in popn.people:
        if  per.utility >= P:
            per.customer = 1
        else:
            per.customer = 0

    numcustomers = 0
    for per in popn.people:
        if per.customer == 1:
            numcustomers += 1                
    return numcustomers



t0 = timer_func()
for i in xrange(20000):
    x = f_wo_append(popn)
t1 = timer_func()
print t1-t0

Edit 4: See the answers by John Machin and TryPyPy

Since there have been so many edits and updates here, those who find themselves here for the first time may be a little confused. See the answers by John Machin and TryPyPy. Both of these can help in improving the speed of the code substantially. I am grateful to them and others who alerted me to slowness of append. Since, in this instance I am going to use John Machin's solution and not use numpy for generating utilities, I am accepting his response as an answer. However, I really appreciate the directions pointed out by TryPyPy also.

702

asked Jan 11 '11 02:01

Curious2learn

2 Answers

There are many things you can try after optimizing your Python code for speed. If this program doesn't need C extensions, you can run it under PyPy to benefit from its JIT compiler. You can try making a C extension for possibly huge speedups. Shed Skin will even allow you to convert your Python program to a standalone C++ binary.

I'm willing to time your program under these different optimization scenarios if you can provide enough code for benchmarking,

Edit: First of all, I have to agree with everyone else: are you sure you're measuring the time correctly? The example code runs 100 times in under 0.1 seconds here, so there is a good chance the either the time is wrong or you have a bottleneck (IO?) that isn't present in the code sample.

That said, I made it 300000 people so times were consistent. Here's the adapted code, shared by CPython (2.5), PyPy and Shed Skin:

from time import time
import random
import sys


class person(object):
    def __init__(self, util):
        self.utility = util
        self.customer = 0


class population(object):
    def __init__(self, numpeople, util):
        self.people = []
        self.cus = []
        self.noncus = []
        for u in util:
            per = person(u)
            self.people.append(per)


def f_w_append(popn):
    '''Function with append'''
    P = 75
    cus = []
    noncus = []
    # Help CPython a bit
    # cus_append, noncus_append = cus.append, noncus.append
    for per in popn.people:
        if  per.utility >= P:
            per.customer = 1
            cus.append(per)
        else:
            per.customer = 0
            noncus.append(per)
    return len(cus)


def f_wo_append(popn):
    '''Function without append'''
    P = 75
    for per in popn.people:
        if  per.utility >= P:
            per.customer = 1
        else:
            per.customer = 0

    numcustomers = 0
    for per in popn.people:
        if per.customer == 1:
            numcustomers += 1
    return numcustomers


def main():
    try:
        numpeople = int(sys.argv[1])
    except:
        numpeople = 300000

    print "Running for %s people, 100 times." % numpeople

    begin = time()
    random.seed(1)
    # Help CPython a bit
    uniform = random.uniform
    util = [uniform(0.0, 300.0) for _ in xrange(numpeople)]
    # util = [random.uniform(0.0, 300.0) for _ in xrange(numpeople)]

    popn1 = population(numpeople, util)
    start = time()
    for _ in xrange(100):
        r = f_wo_append(popn1)
    print r
    print "Without append: %s" % (time() - start)


    popn2 = population(numpeople, util)
    start = time()
    for _ in xrange(100):
        r = f_w_append(popn2)
    print r
    print "With append: %s" % (time() - start)

    print "\n\nTotal time: %s" % (time() - begin)

if __name__ == "__main__":
    main()

Running with PyPy is as simple as running with CPython, you just type 'pypy' instead of 'python'. For Shed Skin, you must convert to C++, compile and run:

shedskin -e makefaster.py && make 

# Check that you're using the makefaster.so file and run test
python -c "import makefaster; print makefaster.__file__; makefaster.main()"

And here is the Cython-ized code:

from time import time
import random
import sys


cdef class person:
    cdef readonly int utility
    cdef public int customer

    def __init__(self, util):
        self.utility = util
        self.customer = 0


class population(object):
    def __init__(self, numpeople, util):
        self.people = []
        self.cus = []
        self.noncus = []
        for u in util:
            per = person(u)
            self.people.append(per)


cdef int f_w_append(popn):
    '''Function with append'''
    cdef int P = 75
    cdef person per
    cus = []
    noncus = []
    # Help CPython a bit
    # cus_append, noncus_append = cus.append, noncus.append

    for per in popn.people:
        if  per.utility >= P:
            per.customer = 1
            cus.append(per)
        else:
            per.customer = 0
            noncus.append(per)
    cdef int lcus = len(cus)
    return lcus


cdef int f_wo_append(popn):
    '''Function without append'''
    cdef int P = 75
    cdef person per
    for per in popn.people:
        if  per.utility >= P:
            per.customer = 1
        else:
            per.customer = 0

    cdef int numcustomers = 0
    for per in popn.people:
        if per.customer == 1:
            numcustomers += 1
    return numcustomers


def main():

    cdef int i, r, numpeople
    cdef double _0, _300
    _0 = 0.0
    _300 = 300.0

    try:
        numpeople = int(sys.argv[1])
    except:
        numpeople = 300000

    print "Running for %s people, 100 times." % numpeople

    begin = time()
    random.seed(1)
    # Help CPython a bit
    uniform = random.uniform
    util = [uniform(_0, _300) for i in xrange(numpeople)]
    # util = [random.uniform(0.0, 300.0) for _ in xrange(numpeople)]

    popn1 = population(numpeople, util)
    start = time()
    for i in xrange(100):
        r = f_wo_append(popn1)
    print r
    print "Without append: %s" % (time() - start)


    popn2 = population(numpeople, util)
    start = time()
    for i in xrange(100):
        r = f_w_append(popn2)
    print r
    print "With append: %s" % (time() - start)

    print "\n\nTotal time: %s" % (time() - begin)

if __name__ == "__main__":
    main()

For building it, it's nice to have a setup.py like this one:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("cymakefaster", ["makefaster.pyx"])]

setup(
  name = 'Python code to speed up',
  cmdclass = {'build_ext': build_ext},
  ext_modules = ext_modules
)

You build it with: python setupfaster.py build_ext --inplace

Then test: python -c "import cymakefaster; print cymakefaster.file; cymakefaster.main()"

Timings were run five times for each version, with Cython being the fastest and easiest of the code generators to use (Shed Skin aims to be simpler, but cryptic error messages and implicit static typing made it harder here). As for best value, PyPy gives impressive speedup in the counter version with no code changes.

#Results (time in seconds for 30000 people, 100 calls for each function):
                  Mean      Min  Times    
CPython 2.5.2
Without append: 35.037   34.518  35.124, 36.363, 34.518, 34.620, 34.559
With append:    29.251   29.126  29.339, 29.257, 29.259, 29.126, 29.272
Total time:     69.288   68.739  69.519, 70.614, 68.746, 68.739, 68.823

PyPy 1.4.1
Without append:  2.672    2.655   2.655,  2.670,  2.676,  2.690,  2.668
With append:    13.030   12.672  12.680, 12.725, 14.319, 12.755, 12.672
Total time:     16.551   16.194  16.196, 16.229, 17.840, 16.295, 16.194

Shed Skin 0.7 (gcc -O2)
Without append:  1.601    1.599   1.599,  1.605,  1.600,  1.602,  1.599
With append:     3.811    3.786   3.839,  3.795,  3.798,  3.786,  3.839
Total time:      5.704    5.677   5.715,  5.705,  5.699,  5.677,  5.726

Cython 0.14 (gcc -O2)
Without append:  1.692    1.673   1.673,  1.710,  1.678,  1.688,  1.711
With append:     3.087    3.067   3.079,  3.080,  3.119,  3.090,  3.067
Total time:      5.565    5.561   5.562,  5.561,  5.567,  5.562,  5.572

Edit: Aaaand more meaningful timings, for 80000 calls with 300 people each:

Results (time in seconds for 300 people, 80000 calls for each function):
                  Mean      Min  Times
CPython 2.5.2
Without append: 27.790   25.827  25.827, 27.315, 27.985, 28.211, 29.612
With append:    26.449   24.721  24.721, 27.017, 27.653, 25.576, 27.277
Total time:     54.243   50.550  50.550, 54.334, 55.652, 53.789, 56.892


Cython 0.14 (gcc -O2)
Without append:  1.819    1.760   1.760,  1.794,  1.843,  1.827,  1.871
With append:     2.089    2.063   2.100,  2.063,  2.098,  2.104,  2.078
Total time:      3.910    3.859   3.865,  3.859,  3.944,  3.934,  3.951

PyPy 1.4.1
Without append:  0.889    0.887   0.894,  0.888,  0.890,  0.888,  0.887
With append:     1.671    1.665   1.665,  1.666,  1.671,  1.673,  1.681
Total time:      2.561    2.555   2.560,  2.555,  2.561,  2.561,  2.569

Shed Skin 0.7 (g++ -O2)
Without append:  0.310    0.301   0.301,  0.308,  0.317,  0.320,  0.303
With append:     1.712    1.690   1.733,  1.700,  1.735,  1.690,  1.702
Total time:      2.027    2.008   2.035,  2.008,  2.052,  2.011,  2.029

Shed Skin becomes fastest, PyPy surpasses Cython. All three speed things up a lot compared to CPython.

107

answered Nov 15 '22 14:11

TryPyPy

Please consider trimming down your f_wo_append function:

def f_wo_append():
    '''Function without append'''
    P = 75
    numcustomers = 0
    for person in popn.people:
        person.customer = iscust = person.utility >= P
        numcustomers += iscust
    return numcustomers

Edit in response to OP's comment """This made it a lot worse! The trimmed version takes 4 times more time than the version I have posted above. """

There is no way that that could take "4 times more" (5 times?) ... here is my code, which demonstrates a significant reduction in the "without append" case, as I suggested, and also introduces a significant improvement in the "with append" case.

import random # instead of numpy
import time
timer_func = time.clock # better on Windows, use time.time on *x platform

class Person(object):
    def __init__(self, util):
        self.utility = util
        self.customer = 0

class Population(object):
    def __init__(self, numpeople):
        random.seed(1)
        self.people = [Person(random.uniform(0, 300)) for i in xrange(numpeople)]
        self.cus = []
        self.noncus = []        

def f_w_append(popn):
    '''Function with append'''
    P = 75
    cus = []
    noncus = []
    for per in popn.people:
        if  per.utility >= P:
            per.customer = 1
            cus.append(per)
        else:
            per.customer = 0
            noncus.append(per)
    popn.cus = cus # omitted from OP's code
    popn.noncus = noncus # omitted from OP's code
    return len(cus)

def f_w_append2(popn):
    '''Function with append'''
    P = 75
    popn.cus = []
    popn.noncus = []
    cusapp = popn.cus.append
    noncusapp = popn.noncus.append
    for per in popn.people:
        if  per.utility >= P:
            per.customer = 1
            cusapp(per)
        else:
            per.customer = 0
            noncusapp(per)
    return len(popn.cus)    

def f_wo_append(popn):
    '''Function without append'''
    P = 75
    for per in popn.people:
        if  per.utility >= P:
            per.customer = 1
        else:
            per.customer = 0

    numcustomers = 0
    for per in popn.people:
        if per.customer == 1:
            numcustomers += 1                
    return numcustomers

def f_wo_append2(popn):
    '''Function without append'''
    P = 75
    numcustomers = 0
    for person in popn.people:
        person.customer = iscust = person.utility >= P
        numcustomers += iscust
    return numcustomers    

if __name__ == "__main__":
    import sys
    popsize, which, niter = map(int, sys.argv[1:4])
    pop = Population(popsize)
    func = (f_w_append, f_w_append2, f_wo_append, f_wo_append2)[which]
    t0 = timer_func()
    for _unused in xrange(niter):
        nc = func(pop)
    t1 = timer_func()
    print "popsize=%d func=%s niter=%d nc=%d seconds=%.2f" % (
        popsize, func.__name__, niter, nc, t1 - t0)

and here are the results of running it (Python 2.7.1, Windows 7 Pro, "Intel Core i3 CPU 540 @ 3.07 GHz"):

C:\junk>\python27\python ncust.py 300 0 80000
popsize=300 func=f_w_append niter=80000 nc=218 seconds=5.48

C:\junk>\python27\python ncust.py 300 1 80000
popsize=300 func=f_w_append2 niter=80000 nc=218 seconds=4.62

C:\junk>\python27\python ncust.py 300 2 80000
popsize=300 func=f_wo_append niter=80000 nc=218 seconds=5.55

C:\junk>\python27\python ncust.py 300 3 80000
popsize=300 func=f_wo_append2 niter=80000 nc=218 seconds=4.29

Edit 3 Why numpy takes longer:

>>> import numpy
>>> utils = numpy.random.uniform(0, 300, 10)
>>> print repr(utils[0])
42.777972538362874
>>> type(utils[0])
<type 'numpy.float64'>

and here's why my f_wo_append2 function took 4 times longer:

>>> x = utils[0]
>>> type(x)
<type 'numpy.float64'>
>>> type(x >= 75) 
<type 'numpy.bool_'> # iscust refers to a numpy.bool_
>>> type(0 + (x >= 75)) 
<type 'numpy.int32'> # numcustomers ends up referring to a numpy.int32
>>>

The empirical evidence is that these custom types aren't so fast when used as scalars ... perhaps because they need to reset the floating-point hardware each time they are used. OK for big arrays, not for scalars.

Are you using any other numpy functionality? If not, just use the random module. If you have other uses for numpy, you may wish to coerce the numpy.float64 to float during the population setup.

answered Nov 15 '22 14:11

John Machin

Related questions
                            
                                PyCharm can't find import in same folder
                            
                                Python Decorator as Callback in Dash Using Dash Object That is an Instance Variable - Fails
                            
                                Training a Keras model yields multiple optimizer errors
                            
                                ConversionError: Failed to convert value(s) to axis units
                            
                                TypedDict when keys have invalid names
                            
                                Encountering " WARN ProcfsMetricsGetter: Exception when trying to compute pagesize" error when running Spark
                            
                                Certifacte verify failed: certificate has expired (_ssl.c:1108)
                            
                                How do you get default headers in a urllib2 Request?
                            
                                Python thread exit code
                            
                                Can bin() be overloaded like oct() and hex() in Python 2.6?
                            
                                Python: prefer several small modules or one larger module? [closed]
                            
                                exit from ipython
                            
                                SFTP using ftplib
                            
                                How to run two functions simultaneously
                            
                                Cosine Similarity of Vectors of different lengths?
                            
                                How do I run a django TestCase manually / against other database?
                            
                                When to inline definitions of metaclass in Python?
                            
                                Image analysis in R
                            
                                Can I include sub-config files in my mercurial .hgrc?
                            
                                How do I use the HtmlUnit driver with Selenium through the Python bindings?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With