Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy: use reshape or newaxis to add dimensions

Tags:

python

numpy

Either ndarray.reshape or numpy.newaxis can be used to add a new dimension to an array. They both seem to create a view, is there any reason or advantage to use one instead of the other?

>>> b
array([ 1.,  1.,  1.,  1.])
>>> c = b.reshape((1,4))
>>> c *= 2
>>> c
array([[ 2.,  2.,  2.,  2.]])
>>> c.shape
(1, 4)
>>> b
array([ 2.,  2.,  2.,  2.])
>>> d = b[np.newaxis,...]
>>> d
array([[ 2.,  2.,  2.,  2.]])
>>> d.shape
(1, 4)
>>> d *= 2
>>> b
array([ 4.,  4.,  4.,  4.])
>>> c
array([[ 4.,  4.,  4.,  4.]])
>>> d
array([[ 4.,  4.,  4.,  4.]])
>>> 

`

like image 485
wwii Avatar asked Feb 07 '15 18:02

wwii


People also ask

How do I add a dimension in NumPy?

You can add new dimensions to a NumPy array ndarray (= unsqueeze a NumPy array) with np. newaxis , np. expand_dims() and np. reshape() (or reshape() method of ndarray ).

What does Newaxis do in NumPy?

Simply put, numpy. newaxis is used to increase the dimension of the existing array by one more dimension, when used once. Thus, 1D array will become 2D array.

What is use of reshape () method in the NumPy?

NumPy: reshape() function The reshape() function is used to give a new shape to an array without changing its data. Array to be reshaped. The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length.

What is difference between reshape and resize in NumPy?

reshape() and numpy. resize() methods are used to change the size of a NumPy array. The difference between them is that the reshape() does not changes the original array but only returns the changed array, whereas the resize() method returns nothing and directly changes the original array.


2 Answers

One reason to use numpy.newaxis over ndarray.reshape is when you have more than one "unknown" dimension to operate with. So, for example, for the following array:

>>> arr.shape
(10, 5)

This works:

>>> arr[:, np.newaxis, :].shape
(10, 1, 5)

But this does not:

>>> arr.reshape(-1, 1, -1)
...
ValueError: can only specify one unknown dimension
like image 62
Rafael Martins Avatar answered Sep 18 '22 23:09

Rafael Martins


I don't see evidence of much difference. You could do a time test on very large arrays. Basically both fiddle with the shape, and possibly the strides. __array_interface__ is a nice way of accessing this information. For example:

In [94]: b.__array_interface__
Out[94]: 
{'data': (162400368, False),
 'descr': [('', '<f8')],
 'shape': (5,),
 'strides': None,
 'typestr': '<f8',
 'version': 3}

In [95]: b[None,:].__array_interface__
Out[95]: 
{'data': (162400368, False),
 'descr': [('', '<f8')],
 'shape': (1, 5),
 'strides': (0, 8),
 'typestr': '<f8',
 'version': 3}

In [96]: b.reshape(1,5).__array_interface__
Out[96]: 
{'data': (162400368, False),
 'descr': [('', '<f8')],
 'shape': (1, 5),
 'strides': None,
 'typestr': '<f8',
 'version': 3}

Both create a view, using the same data buffer as the original. Same shape, but reshape doesn't change the strides. reshape lets you specify the order.

And .flags shows differences in the C_CONTIGUOUS flag.

reshape may be faster because it is making fewer changes. But either way the operation shouldn't affect the time of larger calculations much.

e.g. for large b

In [123]: timeit np.outer(b.reshape(1,-1),b)
1 loops, best of 3: 288 ms per loop
In [124]: timeit np.outer(b[None,:],b)
1 loops, best of 3: 287 ms per loop

Interesting observation that: b.reshape(1,4).strides -> (32, 8)

Here's my guess. .__array_interface__ is displaying an underlying attribute, and .strides is more like a property (though it may all be buried in C code). The default underlying value is None, and when needed for calculation (or display with .strides) it calculates it from the shape and item size. 32 is the distance to the end of the 1st row (4x8). np.ones((2,4)).strides has the same (32,8) (and None in __array_interface__.

b[None,:] on the other hand is preparing the array for broadcasting. When broadcasted, existing values are used repeatedly. That's what the 0 in (0,8) does.

In [147]: b1=np.broadcast_arrays(b,np.zeros((2,1)))[0]

In [148]: b1.shape
Out[148]: (2, 5000)

In [149]: b1.strides
Out[149]: (0, 8)

In [150]: b1.__array_interface__
Out[150]: 
{'data': (3023336880L, False),
 'descr': [('', '<f8')],
 'shape': (2, 5),
 'strides': (0, 8),
 'typestr': '<f8',
 'version': 3}

b1 displays the same as np.ones((2,5)) but has only 5 items.

np.broadcast_arrays is a function in /numpy/lib/stride_tricks.py. It uses as_strided from the same file. These functions directly play with the shape and strides attributes.

like image 20
hpaulj Avatar answered Sep 17 '22 23:09

hpaulj