Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to stack arrays and scalars in numpy?

Tags:

python

numpy

I have a list of numpy vectors (1-D arrays) or scalars (i.e. just numbers). All the vectors have the same length but I don't know what that is. I need to vstack all the elements to create one matrix (2-D array) in such a way that the scalars are treated as vectors having the scalar at each position.

Example is the best description:

Case 1:

>>> np.vstack([np.array([1, 2, 3]), np.array([3, 2, 1])])
array([[1, 2, 3],
       [3, 2, 1]])

Case 2:

>>> np.vstack([1, 2])
array([[1],
       [2]])

Case 3:

>>> np.vstack([np.array([1, 2, 3]), 0, np.array([3, 2, 1])])
np.array([[1, 2, 3],
          [0, 0, 0],
          [3, 2, 1]])

Cases 1 and 2 work out-of-the-box. In case 3, however, it does not as vstack needs all the elements to be arrays of the same length.

Is there some nice way (preferably one-liner) of achieving this?

like image 637
zegkljan Avatar asked Mar 27 '16 21:03

zegkljan


2 Answers

You could create broadcast object, and call np.column_stack on that:

In [175]: np.column_stack(np.broadcast([1, 2, 3], 0, [3, 2, 1]))
Out[175]: 
array([[1, 2, 3],
       [0, 0, 0],
       [3, 2, 1]])

Alternatively, you could ask NumPy to literally broadcast the items to compatibly-shaped arrays:

In [158]: np.broadcast_arrays([1, 2, 3], [3, 2, 1], 0)
Out[158]: [array([1, 2, 3]), array([3, 2, 1]), array([0, 0, 0])]

and then call vstack or row_stack on that:

In [176]: np.row_stack(np.broadcast_arrays([1, 2, 3], 0, [3, 2, 1]))
Out[176]: 
array([[1, 2, 3],
       [0, 0, 0],
       [3, 2, 1]])

Of these two options (using np.broadcast or np.broadcast_arrays), np.broadcast is quicker since you don't actually need to instantiate the broadcasted sub-arrays.

One limitation of np.broadcast, however, is that it can accept at most 32 arguments. In that case, use np.broadcast_arrays.

like image 178
unutbu Avatar answered Sep 21 '22 04:09

unutbu


The problem here is to fill the gap between the readable python world, and the efficient numpy world.

Experimentally, python is paradoxically often better that numpy for this task. With l=[ randint(10) if n%2 else randint(0,10,100) for n in range(32)] :

In [11]: %timeit array([x if type(x) is ndarray else [x]*100 for x in l])
1000 loops, best of 3: 655 µs per loop

In [12]: %timeit column_stack(broadcast(*l))
100 loops, best of 3: 3.77 ms per loop

Furthermore broadcast is limited to 32 elements.

like image 28
B. M. Avatar answered Sep 20 '22 04:09

B. M.