Either <code>ndarray.reshape</code> or <code>numpy.newaxis</code> can be used to add a new dimension to an array. They both seem to create a view, is there any reason or advantage to use one instead of the other? <pre class="prettyprint"><code>>>> b array([ 1., 1., 1., 1.]) >>> c = b.reshape((1,4)) >>> c *= 2 >>> c array([[ 2., 2., 2., 2.]]) >>> c.shape (1, 4) >>> b array([ 2., 2., 2., 2.]) >>> d = b[np.newaxis,...] >>> d array([[ 2., 2., 2., 2.]]) >>> d.shape (1, 4) >>> d *= 2 >>> b array([ 4., 4., 4., 4.]) >>> c array([[ 4., 4., 4., 4.]]) >>> d array([[ 4., 4., 4., 4.]]) >>> </code></pre> `

One reason to use <code>numpy.newaxis</code> over <code>ndarray.reshape</code> is when you have more than one "unknown" dimension to operate with. So, for example, for the following array: <pre class="prettyprint"><code>>>> arr.shape (10, 5) </code></pre> This works: <pre class="prettyprint"><code>>>> arr[:, np.newaxis, :].shape (10, 1, 5) </code></pre> But this does not: <pre class="prettyprint"><code>>>> arr.reshape(-1, 1, -1) ... ValueError: can only specify one unknown dimension </code></pre>

I don't see evidence of much difference. You could do a time test on very large arrays. Basically both fiddle with the shape, and possibly the strides. <code>__array_interface__</code> is a nice way of accessing this information. For example: <pre class="prettyprint"><code>In [94]: b.__array_interface__ Out[94]: {'data': (162400368, False), 'descr': [('', '<f8')], 'shape': (5,), 'strides': None, 'typestr': '<f8', 'version': 3} In [95]: b[None,:].__array_interface__ Out[95]: {'data': (162400368, False), 'descr': [('', '<f8')], 'shape': (1, 5), 'strides': (0, 8), 'typestr': '<f8', 'version': 3} In [96]: b.reshape(1,5).__array_interface__ Out[96]: {'data': (162400368, False), 'descr': [('', '<f8')], 'shape': (1, 5), 'strides': None, 'typestr': '<f8', 'version': 3} </code></pre> Both create a view, using the same <code>data</code> buffer as the original. Same shape, but reshape doesn't change the <code>strides</code>. <code>reshape</code> lets you specify the <code>order</code>. And <code>.flags</code> shows differences in the <code>C_CONTIGUOUS</code> flag. <code>reshape</code> may be faster because it is making fewer changes. But either way the operation shouldn't affect the time of larger calculations much. e.g. for large <code>b</code> <pre class="prettyprint"><code>In [123]: timeit np.outer(b.reshape(1,-1),b) 1 loops, best of 3: 288 ms per loop In [124]: timeit np.outer(b[None,:],b) 1 loops, best of 3: 287 ms per loop </code></pre> <hr> Interesting observation that: <code>b.reshape(1,4).strides -> (32, 8)</code> Here's my guess. <code>.__array_interface__</code> is displaying an underlying attribute, and <code>.strides</code> is more like a property (though it may all be buried in C code). The default underlying value is <code>None</code>, and when needed for calculation (or display with <code>.strides</code>) it calculates it from the shape and item size. <code>32</code> is the distance to the end of the 1st row (4x8). <code>np.ones((2,4)).strides</code> has the same <code>(32,8)</code> (and <code>None</code> in <code>__array_interface__</code>. <code>b[None,:]</code> on the other hand is preparing the array for broadcasting. When broadcasted, existing values are used repeatedly. That's what the <code>0</code> in <code>(0,8)</code> does. <pre class="prettyprint"><code>In [147]: b1=np.broadcast_arrays(b,np.zeros((2,1)))[0] In [148]: b1.shape Out[148]: (2, 5000) In [149]: b1.strides Out[149]: (0, 8) In [150]: b1.__array_interface__ Out[150]: {'data': (3023336880L, False), 'descr': [('', '<f8')], 'shape': (2, 5), 'strides': (0, 8), 'typestr': '<f8', 'version': 3} </code></pre> <code>b1</code> displays the same as <code>np.ones((2,5))</code> but has only 5 items. <code>np.broadcast_arrays</code> is a function in <code>/numpy/lib/stride_tricks.py</code>. It uses <code>as_strided</code> from the same file. These functions directly play with the shape and strides attributes.

Numpy: use reshape or newaxis to add dimensions

Tags:

python

numpy

Either ndarray.reshape or numpy.newaxis can be used to add a new dimension to an array. They both seem to create a view, is there any reason or advantage to use one instead of the other?

>>> b
array([ 1.,  1.,  1.,  1.])
>>> c = b.reshape((1,4))
>>> c *= 2
>>> c
array([[ 2.,  2.,  2.,  2.]])
>>> c.shape
(1, 4)
>>> b
array([ 2.,  2.,  2.,  2.])
>>> d = b[np.newaxis,...]
>>> d
array([[ 2.,  2.,  2.,  2.]])
>>> d.shape
(1, 4)
>>> d *= 2
>>> b
array([ 4.,  4.,  4.,  4.])
>>> c
array([[ 4.,  4.,  4.,  4.]])
>>> d
array([[ 4.,  4.,  4.,  4.]])
>>>

485

asked Feb 07 '15 18:02

wwii

2 Answers

One reason to use numpy.newaxis over ndarray.reshape is when you have more than one "unknown" dimension to operate with. So, for example, for the following array:

>>> arr.shape
(10, 5)

This works:

>>> arr[:, np.newaxis, :].shape
(10, 1, 5)

But this does not:

>>> arr.reshape(-1, 1, -1)
...
ValueError: can only specify one unknown dimension

answered Sep 18 '22 23:09

Rafael Martins

I don't see evidence of much difference. You could do a time test on very large arrays. Basically both fiddle with the shape, and possibly the strides. __array_interface__ is a nice way of accessing this information. For example:

In [94]: b.__array_interface__
Out[94]: 
{'data': (162400368, False),
 'descr': [('', '<f8')],
 'shape': (5,),
 'strides': None,
 'typestr': '<f8',
 'version': 3}

In [95]: b[None,:].__array_interface__
Out[95]: 
{'data': (162400368, False),
 'descr': [('', '<f8')],
 'shape': (1, 5),
 'strides': (0, 8),
 'typestr': '<f8',
 'version': 3}

In [96]: b.reshape(1,5).__array_interface__
Out[96]: 
{'data': (162400368, False),
 'descr': [('', '<f8')],
 'shape': (1, 5),
 'strides': None,
 'typestr': '<f8',
 'version': 3}

Both create a view, using the same data buffer as the original. Same shape, but reshape doesn't change the strides. reshape lets you specify the order.

And .flags shows differences in the C_CONTIGUOUS flag.

reshape may be faster because it is making fewer changes. But either way the operation shouldn't affect the time of larger calculations much.

e.g. for large b

In [123]: timeit np.outer(b.reshape(1,-1),b)
1 loops, best of 3: 288 ms per loop
In [124]: timeit np.outer(b[None,:],b)
1 loops, best of 3: 287 ms per loop

Interesting observation that: b.reshape(1,4).strides -> (32, 8)

Here's my guess. .__array_interface__ is displaying an underlying attribute, and .strides is more like a property (though it may all be buried in C code). The default underlying value is None, and when needed for calculation (or display with .strides) it calculates it from the shape and item size. 32 is the distance to the end of the 1st row (4x8). np.ones((2,4)).strides has the same (32,8) (and None in __array_interface__.

b[None,:] on the other hand is preparing the array for broadcasting. When broadcasted, existing values are used repeatedly. That's what the 0 in (0,8) does.

In [147]: b1=np.broadcast_arrays(b,np.zeros((2,1)))[0]

In [148]: b1.shape
Out[148]: (2, 5000)

In [149]: b1.strides
Out[149]: (0, 8)

In [150]: b1.__array_interface__
Out[150]: 
{'data': (3023336880L, False),
 'descr': [('', '<f8')],
 'shape': (2, 5),
 'strides': (0, 8),
 'typestr': '<f8',
 'version': 3}

b1 displays the same as np.ones((2,5)) but has only 5 items.

np.broadcast_arrays is a function in /numpy/lib/stride_tricks.py. It uses as_strided from the same file. These functions directly play with the shape and strides attributes.

answered Sep 17 '22 23:09

hpaulj

Related questions
                            
                                Adding lambda functions with the same operator in python
                            
                                Error: 'conda' can only be installed into the root environment
                            
                                Spark can access Hive table from pyspark but not from spark-submit
                            
                                Emacs: How do I set flycheck to Python 3?
                            
                                Iterate thru ec2 describe instance boto3
                            
                                python3: Read json file from url
                            
                                Check if an item is in a nested list
                            
                                Pandas Get a List Of All Data Frames loaded into memory
                            
                                access to numbers in classification_report - sklearn
                            
                                conda fails to create environment from yml
                            
                                The function to_excel of pandas generate an unexpected TypeError
                            
                                Convert \r text to \n so readlines() works as intended
                            
                                Handle Arbitrary Exception, Print Default Exception Message
                            
                                Managing multiple settings.py files [duplicate]
                            
                                Returning tuple with a single item from a function
                            
                                - vs -= operators with numpy
                            
                                How to configure Python Kivy for PyCharm on Windows?
                            
                                python script to concatenate all the files in the directory into one file
                            
                                Python Screenshot of inactive window PrintWindow + win32gui
                            
                                python string format() with dict with integer keys [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With