Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Initialise a NumPy array based on its index

I am creating a couple of multi dimensional arrays using NumPy and inititalising them based on the index as follows:

pos_data = []
# Some typical values
d = 2  # Could also be 3
vol_ext = (1000, 500)  # If d = 3, this will have another entry
ratio = [5.0, 8.0]  # Again, if d = 3, it will have another entry

for i in range(d):
    pos_data.append(np.zeros(vol_ext))

if d == 2:
    for y in range(vol_ext[1]):
        for x in range(vol_ext[0]):
            pos_data[0][x, y] = (x - 1.0) * ratio[0]
            pos_data[1][x, y] = (y - 1.0) * ratio[1]
elif d == 3:
    for z in range(vol_ext[2]):
        for y in range(vol_ext[1]):
            for x in range(vol_ext[0]):
                pos_data[0][x, y, z] = (x - 1.0) * ratio[0]
                pos_data[1][x, y, z] = (y - 1.0) * ratio[1]
                pos_data[2][x, y, z] = (z - 1.0) * ratio[2]

The looping is a bit ugly and slow as well. Additionally, if I have a 3-dimensional object then I have to have another nested loop.

I was wondering if there is a Pythonic way to generate these values as they are just based on the x, y and z indices. I tried to use the combinatorics bit from itertools, but I could not make it work.

like image 468
Luca Avatar asked Mar 23 '18 14:03

Luca


People also ask

How do I create a NumPy array with the same value?

fill() method is used to fill the numpy array with a scalar value. If we have to initialize a numpy array with an identical value then we use numpy. ndarray. fill().

Can NumPy arrays be indexed?

ndarrays can be indexed using the standard Python x[obj] syntax, where x is the array and obj the selection. There are different kinds of indexing available depending on obj: basic indexing, advanced indexing and field access.


3 Answers

It's easy with np.meshgrid:

pos_data = np.meshgrid(*(r * (np.arange(s) - 1.0)
                         for s, r in zip(vol_ext, ratio)), indexing='ij')
like image 128
jdehesa Avatar answered Oct 23 '22 05:10

jdehesa


I would generate a two or three dimensional numpy.meshgrid of data, then scale each entry by the ratio per slice.

For the 2D case:

(X, Y) = np.meshgrid(np.arange(vol_ext[1]), np.arange(vol_ext[0]))
pos_data = [(Y - 1) * ratio[0], (X - 1) * ratio[1]]

For the 3D case:

(X, Y, Z) = np.meshgrid(np.arange(vol_ext[2]), np.arange(vol_ext[1]), np.arange(vol_ext[0]))
pos_data = [(Z - 1) * ratio[0], (Y - 1) * ratio[1], (X - 1) * ratio[2]]

Example using your 2D data

pos_data has been generated by your code. I've created a new list pos_data2 that stores the equivalent list using the above solution:

In [40]: vol_ext = (1000, 500)

In [41]: (X, Y) = np.meshgrid(np.arange(vol_ext[1]), np.arange(vol_ext[0]))

In [42]: pos_data2 = [(Y - 1) * ratio[0], (X - 1) * ratio[1]]

In [43]: np.allclose(pos_data[0], pos_data2[0])
Out[43]: True

In [44]: np.allclose(pos_data[1], pos_data2[1])
Out[44]: True

Making this adaptive based on vol_ext

We can combine this with a list comprehension where we can take advantage of the fact that the output of numpy.meshgrid is a tuple:

pts = [np.arange(v) for v in reversed(vol_ext)]
pos_data = [(D - 1) * R for (D, R) in zip(reversed(np.meshgrid(*pts)), ratio)]

The first line of code generates the range of points per desired dimension as a list. We then use a list comprehension to compute the desired calculations per slice by iterating over each desired grid of points in the desired dimension combined with the correct ratio to apply.

Example Run

In [49]: pts = [np.arange(v) for v in reversed(vol_ext)]

In [50]:  pos_data2 = [(D - 1) * R for (D, R) in zip(reversed(np.meshgrid(*pts)), ratio)]

In [51]: np.all([np.allclose(p, p2) for (p, p2) in zip(pos_data, pos_data2)])
Out[51]: True

The last line goes through each slice and ensures both lists align.

like image 38
rayryeng Avatar answered Oct 23 '22 05:10

rayryeng


I think there are a couple of things to consider:

  • is there a reason that pos_data has to be a list?
  • don't have another variable (d) that you have to hard code, when it is always supposed to be the length of some other tuple.

With these in mind, you can solve your problem of variable numbers of for loops using itertools.product, which basically is just a shorthand for nested for loops. The positional args for product are the ranges of the loops.

My implementation is:

from itertools import product

vol_ext = (1000, 500)  # If d = 3, this will have another entry
ratio = [5.0, 8.0]  # Again, if d = 3, it will have another entry

pos_data_new = np.zeros((len(ratio), *vol_ext))

# now loop over each dimension in `vol_ext`. Since `product` expects
# positional arguments, we have to unpack a tuple of `range(vol)`s.
for inds in product(*(range(vol) for vol in vol_ext)):
    # inds is now a tuple, and we have to combine it with a slice in 
    # in the first dimension, and use it as an array on the right hand 
    # side to do the computation. 
    pos_data_new[(slice(None),) + inds] = (np.array(inds) - 1) * ratio

I don't think this will be any faster, but it certainly looks nicer.

Note that pos_data_new is now an array, to get it as a list in the first dimension, as per the original example, is simple enough.

like image 38
ben thorne Avatar answered Oct 23 '22 06:10

ben thorne