Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does NumPy multidimensional array iteration work? (With and without nditer)

Note: I am not sure if this is a duplicate or not -- please let me know if it is (and close the question).

If one has a 1-dimensional NumPy array vector, then if one writes a for loop of the form:

for element in vector :
    print(element)

The result will print each element of the NumPy array.

If one has a 2-dimensional NumPy array matrix, then if one writes a for loop of the form:

for vector in matrix :
    print(vector)

The result will print each row of the 2-dimensional NumPy array, i.e. it will print 1-dimensional NumPy arrays, and it will not print each element of the array individually.

However, if one instead writes the for loop as:

import numpy
for element in numpy.nditer(matrix) :
     print(element)

The result will print each element of the 2-dimensional NumPy array.

Question: What happens if one has a 3-dimensional NumPy array, tensor?

a. If one writes a for loop of the form:

for unknownType in tensor :
     print(unknownType)

Will this print the constituent 2-dimensional NumPy (sub-)arrays of tensor?

I.e. for an n-dimensional NumPy array nArray, does for unknownType in nArray : iterate over the constituent (n-1)-dimensional NumPy (sub-)arrays of nArray?

b. If one writes a for loop of the form:

for unknownType in numpy.nditer(tensor) :
    print(unknownType)

Will this print the elements of tensor? Or will it print the constituent 1-dimensional NumPy (sub-)arrays of the constituent 2-dimensional NumPy (sub-)arrays of tensor?

I.e. for an n-dimensional NumPy array nArray, does for unknownType in nditer(nArray) : iterate over the elements of nArray? Or does it iterate over the constituent (n-2)-dimensional NumPy (sub-)arrays of the constituent (n-1)-dimensional NumPy (sub-)arrays of nArray?

It is unclear to me from the name nditer, since I don't know what "nd" stands for ("iter" is obviously short for "iteration"). And presumably one can think of the elements as "0-dimensional NumPy arrays", so the examples given to me for 2-dimensional NumPy arrays are ambiguous.

I've looked at the np.nditer documentation but honestly I didn't understand the examples or what they were trying to demonstrate -- it seems like it was written for programmers (which I am not) by programmers.

like image 296
Chill2Macht Avatar asked Aug 08 '17 20:08

Chill2Macht


2 Answers

a)

for x in arr: iterates on the 1st dimension of an array.

In [233]: for x in np.arange(24).reshape((2,3,4)):
     ...:     print(x.shape)
     ...:     
(3, 4)
(3, 4)

I think of it as for x in list(arr):.... It breaks the array into a list of subarrays.

b)

It's tricky to control the depth of iteration with nditer. As a default it iterates at the element level. The tutorial page shows some tricks using buffers and order. but the best way I seen is to use ndindex.

ndindex constructs a dummy array of the right size, and does multi_index iteration.

For example to iterate on the 1st 2 dimensions of a 3d array:

In [237]: arr = np.arange(24).reshape(2,3,4)
In [240]: for idx in np.ndindex(arr.shape[:2]):
     ...:     print(idx, arr[idx], arr[idx].sum())
     ...:      
(0, 0) [0 1 2 3] 6
(0, 1) [4 5 6 7] 22
(0, 2) [ 8  9 10 11] 38
(1, 0) [12 13 14 15] 54
(1, 1) [16 17 18 19] 70
(1, 2) [20 21 22 23] 86

I could do the same iteration with

for i in range(2):
    for j in range(3):
         arr[i,j]...

or

arr1 = arr.reshape(-1,4)
for ij in range(6):
    arr1[ij]....

Speed will be basically the same - all poor compared to array functions that work on the whole 3d array at once, or ones that take some sort of axis parameter.

In [241]: arr.sum(axis=2)
Out[241]: 
array([[ 6, 22, 38],
       [54, 70, 86]])

The class for numpy as arrays is np.ndarray. Presumably nditer is named like that. nditer was written as a way of consolidating the various that c level code could iterate on arrays, especially several broadcastable ones. The np.nditer function gives access to the c level iterator. But since the actually iteration is still being done in Python code, so there's little to no speed advantage.

like image 103
hpaulj Avatar answered Nov 12 '22 05:11

hpaulj


If you just use a for loop the iteration is over the first dimension, if the array has only one dimension this will be the elements, if it's 2D it will be the rows, if it's 3D it will iterate over the planes, ...

However nditer is a ND (stands for n-dimensional) iterator. It will iterate over each element in the array. It's (roughly!) equivalent to for item in your_array.ravel() (iterating over a flattened "view" of the array). For 1D arrays it iterates over the elements, for 2D arrays it iterates first over the elements in the first row, then over the second row, and so on.

Note that nditer is much more powerful than that, it can iterate over multiple arrays at once, you can buffer the iteration and a lot of other stuff.


However with NumPy you generally don't want to use a for-loop or np.nditer. There are lots of "vectorized" operations that make manual iteration (in most cases) unnecessary.

like image 42
MSeifert Avatar answered Nov 12 '22 03:11

MSeifert