Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is numpy.transpose reordering data in memory?

In order to speed up the functions like np.std, np.sum etc along an axis of an n dimensional huge numpy array, it is recommended to apply along the last axis.

When I do, np.transpose to rotate the axis I want to operate, to the last axis. Is it really reshuffling the data in memory, or just changing the way the axis are addressed?

When i tried to measure the time using %timeit. it was doing this transpose in micro seconds, (much smaller than the time required to copy the (112x1024x1024) array i was having.

If it is not actually reordering the data in memory and only changing the addressing, will it still speed up the np.sum or np.std when applied to newly rotated last axis?

When i tried to measure it, i does seem to speed up. But i don't understand how.

Update

It doesn't really seem to speed up with transpose. The fastest axis is last one when it is C-ordered, and first one when it is Fortran-ordered. So there is no point in transposing before applying np.sum or np.std. For my specific code, i solved the issue by giving order='FORTRAN' during the array creation. Which made the first axis fastest.

Thanks for all the answers.

like image 959
indiajoe Avatar asked Oct 20 '13 15:10

indiajoe


People also ask

What does NumPy transpose do?

Reverse or permute the axes of an array; returns the modified array. For an array a with two axes, transpose(a) gives the matrix transpose.

What is difference between .T and transpose () in NumPy?

T and the transpose() call both return the transpose of the array. In fact, . T return the transpose of the array, while transpose is a more general method_ that can be given axes ( transpose(*axes) , with defaults that make the call transpose() equivalent to . T ).

How is a NumPy array stored in memory?

A NumPy array can be specified to be stored in row-major format, using the keyword argument order='C' , and the column-major format, using the keyword argument order='F' , when the array is created or reshaped. The default format is row-major.

Is NumPy array memory efficient?

NumPy uses much less memory to store data The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.


2 Answers

Transpose just changes the strides, it doesn't touch the actual array. I think the reason why sum etc. along the final axis is recommended (I'd like to see the source for that, btw.) is that when an array is C-ordered, walking along the final axis preserves locality of reference. That won't be the case after you transpose, since the transposed array will be Fortran-ordered.

like image 142
Fred Foo Avatar answered Sep 18 '22 14:09

Fred Foo


To elaborate on larsman's answer, here are some timings:

# normal C (row-major) order array
>>> %%timeit a = np.random.randn(500, 400)
>>> np.sum(a, axis=1)
1000 loops, best of 3: 272 us per loop

# transposing and summing along the first axis makes no real difference 
# to performance
>>> %%timeit a = np.random.randn(500, 400)
>>> np.sum(a.T, axis=0)
1000 loops, best of 3: 269 us per loop

# however, converting to Fortran (column-major) order does improve speed...
>>> %%timeit a = np.asfortranarray(np.random.randn(500,400))
>>> np.sum(a, axis=1)
10000 loops, best of 3: 114 us per loop

# ... but only if you don't count the conversion in the timed operations
>>> %%timeit a = np.random.randn(500, 400)
>>> np.sum(np.asfortranarray(a), axis=1)
1000 loops, best of 3: 599 us per loop

In summary, it might make sense to convert your arrays to Fortran order if you're going to apply a lot of operations over the columns, but the conversion itself is costly and almost certainly not worth it for a single operation.

like image 38
ali_m Avatar answered Sep 16 '22 14:09

ali_m