Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: is the iteration of the multidimensional array super slow?

I have to iterate all items in two-dimensional array of integers and change the value (according to some rule, not important).

I'm surprised how significant difference in performance is there between python runtime and C# or java runtime. Did I wrote totally wrong python code (v2.7.2)?

import numpy
a = numpy.ndarray((5000,5000), dtype = numpy.int32)
for x in numpy.nditer(a.T):
    x = 123
>python -m timeit -n 2 -r 2 -s "import numpy; a = numpy.ndarray((5000,5000), dtype=numpy.int32)" "for x in numpy.nditer(a.T):" "  x = 123"
2 loops, best of 2: 4.34 sec per loop

For example the C# code performs only 50ms, i.e. python is almost 100 times slower! (suppose the matrix variable is already initialized)

for (y = 0; y < 5000; y++)
for (x = 0; x < 5000; x++)
    matrix[y][x] = 123;
like image 555
Petr Felzmann Avatar asked Apr 11 '12 19:04

Petr Felzmann


People also ask

How does Python handle multidimensional arrays?

In Python, Multidimensional Array can be implemented by fitting in a list function inside another list function, which is basically a nesting operation for the list function. Here, a list can have a number of values of any data type that are segregated by a delimiter like a comma.

Is NumPy array slow?

The reason why NumPy is fast when used right is that its arrays are extremely efficient. They are like C arrays instead of Python lists.

Why is NumPy faster than for loop?

NumPy is fast because it can do all its calculations without calling back into Python. Since this function involves looping in Python, we lose all the performance benefits of using NumPy. For a 10,000,000-entry NumPy array, this functions takes 2.5 seconds to run on my computer.

How do you iterate through a multidimensional array?

In order to loop over a 2D array, we first go through each row, and then again we go through each column in every row. That's why we need two loops, nested in each other. Anytime, if you want to come out of the nested loop, you can use the break statement.


1 Answers

Yep! Iterating through numpy arrays in python is slow. (Slower than iterating through a python list, as well.)

Typically, you avoid iterating through them directly.

If you can give us an example of the rule you're changing things based on, there's a good chance that it's easy to vectorize.

As a toy example:

import numpy as np

x = np.linspace(0, 8*np.pi, 100)
y = np.cos(x)

x[y > 0] = 100

However, in many cases you have to iterate, either due to the algorithm (e.g. finite difference methods) or to lessen the memory cost of temporary arrays.

In that case, have a look at Cython, Weave, or something similar.

like image 74
Joe Kington Avatar answered Sep 28 '22 01:09

Joe Kington