Order of indexes in a Numpy multidimensional array

Tags:

For example, say I'm simulating a bunch of particles doing something over time, and I have a multidimensional array called particles with these indexes:

The x/y/z coordinates of the particle (of length a, which is 3 for a 3d space)
The index of the individual particle (of length b)
The index of the time step it's on (of length c)

Is it better to construct the array such that particles.shape == (a, b, c) or particles.shape == (c, b, a)?

I'm more interested in convention than efficiency: Numpy arrays can be set up in either C-style (last index varies most rapidly) or Fortran-style (first index), so it can efficiently support either setup. I also realize I can use transpose to put the indexes in any order I need, but I'd like to minimize that.

I started to research this myself and found support for both ways:

Pro-(c,b,a):

By default, Numpy uses C-style arrays where the last index is the fastest-varying.
Most of the vector algebra functions (inner, cross, etc.) act on the last index. (dot acts on the last of one and the second-to-last of the other.)
The matplotlib collection objects (LineCollection, PolyCollection) expect arrays with the spatial coordinates in the last axis.

Pro-(a,b,c):

If i were to use meshgrid and mgrid to produce a set of points, it would put the spatial axis first. For instance, np.mgrid[0:5,0:5,0:5].shape == (3,5,5,5). I realize these functions are mostly intended for integer array indexing, but it's not uncommon to use them to generate a grid of points.
The matplotlib scatter and plot functions split out their arguments, so it's agnostic to the shape of the array, but ax.plot3d(particles[0], particles[1], particles[2]) is shorter to type than the version with particles[..., 0]

In general it appears that there are two different conventions in existence (probably due to historical differences between C and Fortran), and it's not clear which is more common in the Numpy community, or more appropriate for what I'm doing.

544

asked Dec 22 '14 16:12

Apocheir

1 Answers

Conventions for something like this have much more to do with particular file-formats than anything else, in my experience. However, there's a quick way to answer which one is likely to be best for what you're doing:

If you have to iterate over an axis, which one are you most likely to iterate over? In other words, which of these is most likely:

# a first
for dimension in particles:
    ...

# b first
for particle in particles:
    ...

# c first
for timestep in particles:
    ...

As far as efficiency goes, this assumes C-order, but that's actually irrelevant here. At the python level, access to numpy arrays is treated as C-ordered regardless of the memory layout. (You always iterate over the first axis, even if that's not the "most contiguous" axis in memory.)

Of course, there are many situations where you should avoid directly iterating over numpy arrays in this matter. Nonetheless, this is the way you should think about it, particularly when it comes to on-disk file structures. Make your most common use case the fastest/easiest.

If nothing else, hopefully this gives you a useful way to think about the question.

answered Oct 09 '22 20:10

Joe Kington

Related questions
                            
                                how to divide rdd data into two in spark?
                            
                                How to fit an ARMAX model using statsmodels
                            
                                Exhaust a list of generators in a cyclic fashion
                            
                                Caching Flask-Login user_loader
                            
                                Get app config from database entries
                            
                                Maintain updated file cache of web pages in Python?
                            
                                Install sympy for python34 in windows 8
                            
                                Does `try... except Exception as e` catch every possible exception?
                            
                                Lazy-loading variables using overloaded decorators
                            
                                django tinymce not showing rich textarea
                            
                                Change Timezone for Date object Python
                            
                                create multiple objects using writable nested serializers in django rest framework 3
                            
                                Catching exceptions that don't inherit from Exception
                            
                                Python cmd on linux does not autocomplete special characters or symbols
                            
                                Force repr() to use single quotes
                            
                                Python: Iterate through object executing code both at certain places and also at end
                            
                                How can I use tweepy without a filter
                            
                                Why are the logistic regression results different between statsmodels and R?
                            
                                Scrapy with Selenium crawling but not scraping
                            
                                How to export list of arrays into csv in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Order of indexes in a Numpy multidimensional array

Tags:

python

arrays

multidimensional-array

matplotlib

numpy

Apocheir

People also ask

1 Answers

Joe Kington

Recent Activity

Donate For Us