Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better use a tuple or numpy array for storing coordinates

I'm porting an C++ scientific application to python, and as I'm new to python, some problems come to my mind:

1) I'm defining a class that will contain the coordinates (x,y). These values will be accessed several times, but they only will be read after the class instantiation. Is it better to use an tuple or an numpy array, both in memory and access time wise?

2) In some cases, these coordinates will be used to build a complex number, evaluated on a complex function, and the real part of this function will be used. Assuming that there is no way to separate real and complex parts of this function, and the real part will have to be used on the end, maybe is better to use directly complex numbers to store (x,y)? How bad is the overhead with the transformation from complex to real in python? The code in c++ does a lot of these transformations, and this is a big slowdown in that code.

3) Also some coordinates transformations will have to be performed, and for the coordinates the x and y values will be accessed in separate, the transformation be done, and the result returned. The coordinate transformations are defined in the complex plane, so is still faster to use the components x and y directly than relying on the complex variables?

Thank you

like image 381
Ivan Avatar asked Apr 01 '10 21:04

Ivan


2 Answers

In terms of memory consumption, numpy arrays are more compact than Python tuples. A numpy array uses a single contiguous block of memory. All elements of the numpy array must be of a declared type (e.g. 32-bit or 64-bit float.) A Python tuple does not necessarily use a contiguous block of memory, and the elements of the tuple can be arbitrary Python objects, which generally consume more memory than numpy numeric types.

So this issue is a hands-down win for numpy, (assuming the elements of the array can be stored as a numpy numeric type).

On the issue of speed, I think the choice boils down to the question, "Can you vectorize your code?"

That is, can you express your calculations as operations done on entire arrays element-wise.

If the code can be vectorized, then numpy will most likely be faster than Python tuples. (The only case I could imagine where it might not be, is if you had many very small tuples. In this case the overhead of forming the numpy arrays and one-time cost of importing numpy might drown-out the benefit of vectorization.)

An example of code that could not be vectorized would be if your calculation involved looking at, say, the first complex number in an array z, doing a calculation which produces an integer index idx, then retrieving z[idx], doing a calculation on that number, which produces the next index idx2, then retrieving z[idx2], etc. This type of calculation might not be vectorizable. In this case, you might as well use Python tuples, since you won't be able to leverage numpy's strength.

I wouldn't worry about the speed of accessing the real/imaginary parts of a complex number. My guess is the issue of vectorization will most likely determine which method is faster. (Though, by the way, numpy can transform an array of complex numbers to their real parts simply by striding over the complex array, skipping every other float, and viewing the result as floats. Moreover, the syntax is dead simple: If z is a complex numpy array, then z.real is the real parts as a float numpy array. This should be far faster than the pure Python approach of using a list comprehension of attribute lookups: [z.real for z in zlist].)

Just out of curiosity, what is your reason for porting the C++ code to Python?

like image 155
unutbu Avatar answered Oct 10 '22 18:10

unutbu


A numpy array with an extra dimension is tighter in memory use, and at least as fast!, as a numpy array of tuples; complex numbers are at least as good or even better, including for your third question. BTW, you may have noticed that -- while questions asked later than yours were getting answers aplenty -- your was laying fallow: part of the reason is no doubt that asking three questions within a question turns responders off. Why not just ask one question per question? It's not as if you get charged for questions or anything, you know...!-)

like image 35
Alex Martelli Avatar answered Oct 10 '22 17:10

Alex Martelli