Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does numpy.broadcast "transpose" results of vstack and similar functions?

Observe:

In [1]: import numpy as np
In [2]: x = np.array([1, 2, 3])
In [3]: np.vstack([x, x])
Out[3]: 
array([[1, 2, 3],
       [1, 2, 3]])

In [4]: np.vstack(np.broadcast(x, x))
Out[4]: 
array([[1, 1],
       [2, 2],
       [3, 3]])

Similarly for column_stack and row_stack (hstack behaves differently in this case but it also differs when used with broadcast). Why?

I'm after the logic behind that rather than finding a way of "repairing" this behavior (I'm just fine with it, it's just unintuitive).

like image 988
zegkljan Avatar asked Mar 27 '16 22:03

zegkljan


People also ask

What is the use of transpose () function in NumPy?

The transpose () function in the numpy library is mainly used to reverse or permute the axes of an array and then it will return the modified array. The main task of this function is to change the column elements into the row elements and the column elements into the row elements.

What is the difference between original and transposed array in Python?

Both the arrays (original and transposed) have the same shape. For a 2d array, the transpose operation means to swap out the rows and columns of the array. Here’s an example – Here, we transpose a 2×3 array.

Why can’t I make a copy of a NumPy array?

Because NumPy array are implemented as contiguous blocks in memory. When we index something like a [ [2,7,11]], the objects at the indices 2, 7 and 11 are stored in a non-contiguous manner. You can not have the elements of the new array lined up in a contiguous manner unless you make a copy.

What is the advantage of using NumPy arrays?

Taking advantage of the way NumPy stores it's arrays, we can r e shape NumPy arrays without incurring any significant computational cost as it merely involves changing strides for the array. The array, which is stored in a contiguous manner in the memory, does not change.


1 Answers

np.broadcast returns an instance of an iterator object that describes how the arrays should be broadcast together.1 Among other things, it describes the shape and the number of dimensions that the resulting array will have.

Crucially, when you actually iterate over this object in Python you get back tuples of elements from each input array:

>>> b = np.broadcast(x, x)
>>> b.shape
(3,)
>>> b.ndim
1
>>> list(b)
[(1, 1), (2, 2), (3, 3)]

This tells us that if we were performing an actual operation on the arrays (say, x+x) NumPy would return an array of shape (3,), one dimension and combine the elements in the tuple to produce the values in the final array (e.g. it would perform 1+1, 2+2, 3+3 for the addition).

If you dig in to the source of vstack you find that all it does is make sure the elements of the iterable that it has been given are at least two-dimensional, and then stack them along axis 0.

In the case of b = np.broadcast(x, x) this means that we get the following arrays to stack:

>>> [np.atleast_2d(_m) for _m in b]
[array([[1, 1]]), array([[2, 2]]), array([[3, 3]])]

These three small arrays are then stacked vertically producing the output you note.


1 Exactly how arrays of varying dimensions are iterated over in parallel is at the very heart of how NumPy's broadcasting works. The code can be found mostly in iterators.c. An interesting overview of NumPy's multidimensional iterator, written by Travis Oliphant himself, can be found in the Beautiful Code book.

like image 133
Alex Riley Avatar answered Sep 23 '22 19:09

Alex Riley