Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is numpy.ravel returning a copy?

Tags:

python

numpy

In the following example:

>>> import numpy as np
>>> a = np.arange(10)
>>> b = a[:,np.newaxis]
>>> c = b.ravel()
>>> np.may_share_memory(a,c)
False

Why is numpy.ravel returning a copy of my array? Shouldn't it just be returning a?

Edit:

I just discovered that np.squeeze doesn't return a copy.

>>> b = a[:,np.newaxis]
>>> c = b.squeeze()
>>> np.may_share_memory(a,c)
True

Why is there a difference between squeeze and ravel in this case?

Edit:

As pointed out by mgilson, newaxis marks the array as discontiguous, which is why ravel is returning a copy.

So, the new question is why is newaxis marking the array as discontiguous.

The story gets even weirder though:

>>> a = np.arange(10)
>>> b = np.expand_dims(a,axis=1)
>>> b.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
>>> c = b.ravel()
>>> np.may_share_memory(a,c)
True

According to the documentation for expand_dims, it should be equivalent to newaxis.

like image 632
user545424 Avatar asked Jul 23 '12 18:07

user545424


People also ask

What does Ravel mean in Numpy?

The numpy module of Python provides a function called numpy. ravel, which is used to change a 2-dimensional array or a multi-dimensional array into a contiguous flattened array. The returned array has the same data type as the source array or input array.

What does Ravel () do in Python?

Python's ravel() function is used to return a contiguous array. This function returns a 1D array that contains the input elements.

What is the difference between Ravel and flatten?

Ravel is faster than flatten() as it does not occupy any memory. Flatten() is comparatively slower than ravel() as it occupies memory. Ravel is a library-level function. Flatten is a method of an ndarray object.

What is Ravel in pandas?

ravel() function returns the flattened underlying data as an ndarray. Syntax: Series.ravel(order='C') Parameter : order. Returns : ndarray.


2 Answers

This may not be the best answer to your question, but it looks like inserting a newaxis causes numpy to view the array as non-contiguous -- probably for broadcasting purposes:

>>> a=np.arange(10)
>>> b=a[:,None]
>>> a.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
>>> b.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

However, a reshape will not cause that:

>>> c=a.reshape(10,1) 
>>> c.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

And those arrays do share the same memory:

>>> np.may_share_memory(c.ravel(),a)
True

EDIT

np.expand_dims is actually implemented using reshape which is why it works (This is a slight error in documentation I suppose). Here's the source (without the docstring):

def expand_dims(a,axis):
    a = asarray(a)
    shape = a.shape
    if axis < 0:
        axis = axis + len(shape) + 1
    return a.reshape(shape[:axis] + (1,) + shape[axis:])
like image 70
mgilson Avatar answered Oct 28 '22 07:10

mgilson


It looks like it may have to do with the strides:

>>> c = np.expand_dims(a, axis=1)
>>> c.strides
(8, 8)

>>> b = a[:, None]
>>> b.strides
(8, 0)
>>> b.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
>>> b.strides = (8, 8)
>>> b.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

I'm not sure what difference the stride on dimension 1 could make here, but it looks like that's what's making numpy treat the array as not contiguous.

like image 24
Bi Rico Avatar answered Oct 28 '22 05:10

Bi Rico