For example, say I'm simulating a bunch of particles doing something over time, and I have a multidimensional array called particles
with these indexes:
a
, which is 3
for a 3d space)b
)c
)Is it better to construct the array such that particles.shape == (a, b, c)
or particles.shape == (c, b, a)
?
I'm more interested in convention than efficiency: Numpy arrays can be set up in either C-style (last index varies most rapidly) or Fortran-style (first index), so it can efficiently support either setup. I also realize I can use transpose
to put the indexes in any order I need, but I'd like to minimize that.
I started to research this myself and found support for both ways:
Pro-(c,b,a):
inner
, cross
, etc.) act on the last index. (dot
acts on the last of one and the second-to-last of the other.)matplotlib
collection objects (LineCollection
, PolyCollection
) expect arrays with the spatial coordinates in the last axis.Pro-(a,b,c):
meshgrid
and mgrid
to produce a set of points, it would put the spatial axis first. For instance, np.mgrid[0:5,0:5,0:5].shape == (3,5,5,5)
. I realize these functions are mostly intended for integer array indexing, but it's not uncommon to use them to generate a grid of points. matplotlib
scatter
and plot
functions split out their arguments, so it's agnostic to the shape of the array, but ax.plot3d(particles[0], particles[1], particles[2])
is shorter to type than the version with particles[..., 0]
In general it appears that there are two different conventions in existence (probably due to historical differences between C and Fortran), and it's not clear which is more common in the Numpy community, or more appropriate for what I'm doing.
NumPy arrays can be sorted by a single column, row, or by multiple columns or rows using the argsort() function. The argsort function returns a list of indices that will sort the values in an array in ascending value.
Ordered sequence is any sequence that has an order corresponding to elements, like numeric or alphabetical, ascending or descending. The NumPy ndarray object has a function called sort() , that will sort a specified array.
We can get the indices of the sorted elements of a given array with the help of argsort() method. This function is used to perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as arr that that would sort the array.
Array indexing is the same as accessing an array element. You can access an array element by referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.
Conventions for something like this have much more to do with particular file-formats than anything else, in my experience. However, there's a quick way to answer which one is likely to be best for what you're doing:
If you have to iterate over an axis, which one are you most likely to iterate over? In other words, which of these is most likely:
# a first
for dimension in particles:
...
# b first
for particle in particles:
...
# c first
for timestep in particles:
...
As far as efficiency goes, this assumes C-order, but that's actually irrelevant here. At the python level, access to numpy arrays is treated as C-ordered regardless of the memory layout. (You always iterate over the first axis, even if that's not the "most contiguous" axis in memory.)
Of course, there are many situations where you should avoid directly iterating over numpy arrays in this matter. Nonetheless, this is the way you should think about it, particularly when it comes to on-disk file structures. Make your most common use case the fastest/easiest.
If nothing else, hopefully this gives you a useful way to think about the question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With