Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy: vectorized access of several columns at once?

I have scripts with multi-dimensional arrays and instead of for-loops I would like to use a vectorized implementation for my problems (which sometimes contain column operations).

Let's consider a simple example with matrix arr:

> arr = np.arange(12).reshape(3, 4)

> arr
> ([[ 0,  1,  2,  3],
    [ 4,  5,  6,  7],
    [ 8,  9, 10, 11]])

> arr.shape
> (3, 4)

So we have a matrix arr with 3 rows and 4 columns.

The simplest case in my scripts is adding something to the values in the array. E.g. I'm doing this for single or multiple rows:

> someVector = np.array([1, 2, 3, 4])
> arr[0] += someVector

> arr
> array([[ 1,  3,  5,  7],    <--- successfully added someVector
         [ 4,  5,  6,  7],         to one row
         [ 8,  9, 10, 11]])

> arr[0:2] += someVector

> arr
> array([[ 2,  5,  8, 11],    <--- added someVector to two
         [ 5,  7,  9, 11],    <--- rows at once
         [ 8,  9, 10, 11]])

This works well. However, sometimes I need to manipulate one or several columns. One column at a time works:

> arr[:, 0] += [1, 2, 3]

> array([[ 3,  5,  8, 11],
         [ 7,  7,  9, 11],
         [11,  9, 10, 11]])
           ^
           |___ added the values [1, 2, 3] successfully to
                this column

But I am struggling to think out why this does not work for multiple columns at once:

> arr[:, 0:2] += [1, 2, 3]

> ValueError
> Traceback (most recent call last)
> <ipython-input-16-5feef53e53af> in <module>()
> ----> 1 arr[:, 0:2] += [1, 2, 3]

> ValueError: operands could not be broadcast
>             together with shapes (3,2) (3,) (3,2)

Isn't this the very same way it works with rows? What am I doing wrong here?

like image 820
daniel451 Avatar asked Dec 14 '22 08:12

daniel451


2 Answers

To add a 1D array to multiple columns you need to broadcast the values to a 2D array. Since broadcasting adds new axes on the left (of the shape) by default, broadcasting a row vector to multiple rows happens automatically:

arr[0:2] += someVector

someVector has shape (N,) and gets automatically broadcasted to shape (1, N). If arr[0:2] has shape (2, N), then the sum is performed element-wise as though both arr[0:2] and someVector were arrays of the same shape, (2, N).

But to broadcast a column vector to multiple columns requires hinting NumPy that you want broadcasting to occur with the axis on the right. In fact, you have to add the new axis on the right explicitly by using someVector[:, np.newaxis] or equivalently someVector[:, None]:

In [41]: arr = np.arange(12).reshape(3, 4)

In [42]: arr[:, 0:2] += np.array([1, 2, 3])[:, None]

In [43]: arr
Out[43]: 
array([[ 1,  2,  2,  3],
       [ 6,  7,  6,  7],
       [11, 12, 10, 11]])

someVector (e.g. np.array([1, 2, 3])) has shape (N,) and someVector[:, None] has shape (N, 1) so now broadcasting happens on the right. If arr[:, 0:2] has shape (N, 2), then the sum is performed element-wise as though both arr[:, 0:2] and someVector[:, None] were arrays of the same shape, (N, 2).

like image 163
unutbu Avatar answered Jan 02 '23 14:01

unutbu


Very clear explanation of @unutbu.

As a complement, transposition (.T) can often simplify the task, by working in the first dimension :

In [273]: arr = np.arange(12).reshape(3, 4)

In [274]: arr.T[0:2] += [1, 2, 3]

In [275]: arr
Out[275]: 
array([[ 1,  2,  2,  3],
       [ 6,  7,  6,  7],
       [11, 12, 10, 11]])
like image 34
B. M. Avatar answered Jan 02 '23 15:01

B. M.