I was implementing a weighting system called TF-IDF on a set of 42000 images, each consisting 784 pixels. This is basically a 42000 by 784 matrix. The first method I attempted made use of explicit loops and took more than 2 hours. <pre class="prettyprint lang-python prettyprint-override"><code>def tfidf(color,img_pix,img_total): if img_pix==0: return 0 else: return color * np.log(img_total/img_pix) ... result = np.array([]) for img_vec in data_matrix: double_vec = zip(img_vec,img_pix_vec) result_row = np.array([tfidf(x[0],x[1],img_total) for x in double_vec]) try: result = np.vstack((result,result_row)) # first row will throw a ValueError since vstack accepts rows of same len except ValueError: result = result_row </code></pre> The second method I attempted used numpy matrices and took less than 5 minutes. Note that data_matrix, img_pix_mat are both 42000 by 784 matrices while img_total is a scalar. <pre class="prettyprint lang-python prettyprint-override"><code>result = data_matrix * np.log(np.divide(img_total,img_pix_mat)) </code></pre> I was hoping someone could explain the immense difference in speed. The authors of the following paper entitled "The NumPy array: a structure for eﬃcient numerical computation" (http://arxiv.org/pdf/1102.1523.pdf), state on the top of page 4 that they observe a 500 times speed increase due to vectorized computation. I'm presuming much of the speed increase I'm seeing is due to this. However, I would like to go a step further and ask why numpy vectorized computations are that much faster than standard python loops? Also, perhaps you guys might know of other reasons why the first method is slow. Do try/except structures have high overhead? Or perhaps forming a new np.array for each loop is takes a long time? Thanks.

The difference you're seeing isn't due to anything fancy like SSE vectorization. There are two primary reasons. The first is that NumPy is written in C, and the C implementation doesn't have to go through the tons of runtime method dispatch and exception checking and so on that a Python implementation goes through. The second reason is that even for Python code, your loop-based implementation is inefficient. You're using <code>vstack</code> in a loop, and every time you call <code>vstack</code>, it has to completely copy all arrays you've passed to it. That adds an extra factor of <code>len(data_matrix)</code> to your asymptotic complexity.

Explain the speed difference between numpy's vectorized function application VS python's for loop

Tags:

performance

python

loops

vectorization

numpy

I was implementing a weighting system called TF-IDF on a set of 42000 images, each consisting 784 pixels. This is basically a 42000 by 784 matrix.

The first method I attempted made use of explicit loops and took more than 2 hours.

def tfidf(color,img_pix,img_total):
    if img_pix==0:
        return 0
    else:
        return color * np.log(img_total/img_pix)

...

result = np.array([])
for img_vec in data_matrix:
    double_vec = zip(img_vec,img_pix_vec)
    result_row = np.array([tfidf(x[0],x[1],img_total) for x in double_vec])
    try:
        result = np.vstack((result,result_row))
    # first row will throw a ValueError since vstack accepts rows of same len
    except ValueError:
        result = result_row

The second method I attempted used numpy matrices and took less than 5 minutes. Note that data_matrix, img_pix_mat are both 42000 by 784 matrices while img_total is a scalar.

result = data_matrix * np.log(np.divide(img_total,img_pix_mat))

I was hoping someone could explain the immense difference in speed.

The authors of the following paper entitled "The NumPy array: a structure for eﬃcient numerical computation" (http://arxiv.org/pdf/1102.1523.pdf), state on the top of page 4 that they observe a 500 times speed increase due to vectorized computation. I'm presuming much of the speed increase I'm seeing is due to this. However, I would like to go a step further and ask why numpy vectorized computations are that much faster than standard python loops?

Also, perhaps you guys might know of other reasons why the first method is slow. Do try/except structures have high overhead? Or perhaps forming a new np.array for each loop is takes a long time?

Thanks.

821

asked Jul 05 '13 07:07

Kao

2 Answers

Due to the internal workings of numpy, (as far as i know, numpy works with C internally, so everything you push down to numpy is actually much faster because it is in a different language)

edit: taking out the zip, and replacing it with a vstack should go faster too, (zip tends to go slow if the arguments are very large, and than vstack is faster), (but that is also something which puts it into numpy (thus into C), while zip is python)

and yes, if i understand correctly - not sure about that- , you are doing 42k times a try/except block, that should definately be bad for the speed,

test:

T=numpy.ndarray((5,10))
for t in T:
print t.shape

results in (10,)

This means that yes, if your matrices are 42kx784 matrices, you are trying 42k times a try-except block, i am assuming that should put an effect in the computation times, as well as each time doing a zip each time, but not certain if that would be the main cause,

(so every one of your 42k times you run your stuff, takes 0.17sec, i am quite certain that a try/except block doesnt take 0.17 seconds, but maybe the overhead it causes or so, does contribute to it?

try changing the following:

double_vec = zip(img_vec,img_pix_vec)
result_row = np.array([tfidf(x[0],x[1],img_total) for x in double_vec])

result_row=np.array([tfidf(img_vec[i],img_pix_vec[i],img_total) for i in xrange(len(img_vec))])

that at least gets rid of the zip statement, But not sure if the zip statement takes down your time by one min, or by nearly two hours (i know zip is slow, compared to numpy vstack, but no clue if that would give you two hours time gain)

120

answered Oct 18 '22 07:10

usethedeathstar

The difference you're seeing isn't due to anything fancy like SSE vectorization. There are two primary reasons. The first is that NumPy is written in C, and the C implementation doesn't have to go through the tons of runtime method dispatch and exception checking and so on that a Python implementation goes through.

The second reason is that even for Python code, your loop-based implementation is inefficient. You're using vstack in a loop, and every time you call vstack, it has to completely copy all arrays you've passed to it. That adds an extra factor of len(data_matrix) to your asymptotic complexity.

answered Oct 18 '22 08:10

user2357112 supports Monica

Related questions
                            
                                How to send a package to PyPi?
                            
                                How to reverse a color map image to scalar values?
                            
                                Platform independent tool to copy text to clipboard
                            
                                Emacs for Python programming: module/class outline/browser
                            
                                Save image created via PIL to django model
                            
                                Can I efficiently swap two class instances by swapping __dict__?
                            
                                python's glob only returning the first result
                            
                                SciPy curve_fit runtime error, stopping iteration
                            
                                Solving Systems of Equations with SymPy
                            
                                String immutability in CPython violated
                            
                                bottle framework with multiple files
                            
                                How to force errorbars to render last with Matplotlib
                            
                                Python’s `str.format()`, fill characters, and ANSI colors
                            
                                Using PostGIS on Python 3
                            
                                How to retrieve session data with Flask?
                            
                                multiprocessing GUI schemas to combat the "Not Responding" blocking
                            
                                Matplotlib animate fill_between shape
                            
                                Python urlparse.parse_qs unicode url
                            
                                How to remove whitespaces and newlines from every value in a JSON file?
                            
                                xml.etree.ElementTree get node depth

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With