How to determine a numpy-array reshape strategy

Tags:

For a python project I often find myself reshaping and re-arranging n-dimensional numpy arrays. However, I have a hard time to determine how to approach the problem, visualize the outcome of the results of the reshaping methods and knowing my solution is efficient.

At the moment when confronted with such a problem my strategy is to start ipython, load some sample data and go trial and error until I find a combination of transpose()s, reshape()s and swapaxes()s. which gets the desired result. It gets the job done, but without a real understanding of what is going on and often produces code which is hard to maintain.

So, my question is about finding a strategy. How do you approach such a problem? How do you visualize an ndarray in your head when you have to shape it in the desired format? How do you come to the right actions?

To make answering a bit more concrete, an example to play with:

Assume you want to reshape the following 3d-array

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

to a 2d-array where the first columns from the 3rd dimension are placed first, the 2nd columns second, ....etc

The result should look like this:

array([[ 0,  9, 18,  3, 12, 21,  6, 15, 24],
       [ 1, 10, 19,  4, 13, 22,  7, 16, 25],
       [ 2, 11, 20,  5, 14, 23,  8, 17, 26]])

PS. also any reading material on the subject would be great!

584

asked May 28 '15 17:05

joepjp

1 Answers

I regularly play about with shapes in ipython. However, to make things clearer, I start with array with distinct dimensions.

arr = np.arange(3*4*5).reshape(3,4,5)

That way, it's easier to identify how the axes get shifted, for example:

In [25]: arr.shape
Out[25]: (3, 4, 5)

In [26]: arr.T.shape
Out[26]: (5, 4, 3)

In [31]: arr.T.reshape(5,-1)
Out[31]: 
array([[ 0, 20, 40,  5, 25, 45, 10, 30, 50, 15, 35, 55],
       [ 1, 21, 41,  6, 26, 46, 11, 31, 51, 16, 36, 56],
       [ 2, 22, 42,  7, 27, 47, 12, 32, 52, 17, 37, 57],
       [ 3, 23, 43,  8, 28, 48, 13, 33, 53, 18, 38, 58],
       [ 4, 24, 44,  9, 29, 49, 14, 34, 54, 19, 39, 59]])

where as a different transpose (that does not switch the order of 3,4)

In [38]: np.transpose(arr,[2,0,1]).shape
Out[38]: (5, 3, 4)

In [39]: np.transpose(arr,[2,0,1]).reshape(5,-1)
Out[39]: 
array([[ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55],
       [ 1,  6, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56],
       [ 2,  7, 12, 17, 22, 27, 32, 37, 42, 47, 52, 57],
       [ 3,  8, 13, 18, 23, 28, 33, 38, 43, 48, 53, 58],
       [ 4,  9, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59]])

I also like to use 'oddly' shaped arrays like this when developing functions. That way, if I do mess up some transpose or broadcasting, dimensions errors will jump out at me. Experience tells me that once I get the dimensions right, the values will also be correct. Or at least the class of errors that affect values is quite different from those that affect dimensions.

I also liberally sprinkle development code with print arr.shape like statements, or even assert x.shape==y.shape assertions.

It also helps to label dimensions:

M, N, L = 3, 4, 5
np.empty((M,N,L))

or like in einsum

np.einsum('ijk,kj->i', A, B) # if A is (M,N,L), B must be (L,N)

https://stackoverflow.com/a/29903842/901925 is an example of trying to understand and explain rollaxis.

Another strategy is to look at the Python code of numpy functions. Often they accept axis arguments. It's instructive to see how they use those. Sometimes that particular axis is rotated to the front, or to the end. Sometimes a nd array is reshaped into a 2d array, collapsing all axes except one down to one. Other achieve generality by constructing and manipulating an indexing tuple. More advanced functions play with the strides as well as the shape.

Whether a dimension should be first or last is usally an optimization issue - and may involve tradeoffs between ease of use (broadcasting, indexing) and speed. Just keep in mind that for "C" order, the last dimension forms contiguous blocks.

102

answered Sep 27 '22 16:09

hpaulj

Related questions
                            
                                Boost Python wrap static member function overload with default argument
                            
                                Finding Primes with Modulo in Python
                            
                                Caching remote repository for pip installs
                            
                                How to make python setuptools find top level modules
                            
                                How to find on which line a specific word is. [python]
                            
                                Pydoop stucks on readline from HDFS files
                            
                                Python - Logging in to Site with SAML 2.0
                            
                                Django TransactionTestCase with rollback emulation
                            
                                What is Pep8 ErrorCode E41?
                            
                                Is it possible to use the "app factory" pattern from Flask with Click CLI applications?
                            
                                Recommended usage of Python dictionary, functions as values
                            
                                Python Regex - replace a string not located between two specific words
                            
                                What is meant by "Not a valid choice" when using wtform validation for one field?
                            
                                Django: How to run a function when server exits?
                            
                                ImportError: No module named 'util'
                            
                                Reading file to stdout with twisted
                            
                                Python import statements in complex package structures?
                            
                                How do I convert list of correlations to covariance matrix?
                            
                                Use a list as the key in a Python dict
                            
                                Nested detail_route in django-rest-framework

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to determine a numpy-array reshape strategy

Tags:

python

arrays

numpy

joepjp

People also ask

1 Answers

hpaulj

Recent Activity

Donate For Us