Is there a way to check if NumPy arrays share the same data?

Tags:

numpy

My impression is that in NumPy, two arrays can share the same memory. Take the following example:

import numpy as np a=np.arange(27) b=a.reshape((3,3,3)) a[0]=5000 print (b[0,0,0]) #5000  #Some tests: a.data is b.data #False a.data == b.data #True  c=np.arange(27) c[0]=5000 a.data == c.data #True ( Same data, not same memory storage ), False positive

So clearly b didn't make a copy of a; it just created some new meta-data and attached it to the same memory buffer that a is using. Is there a way to check if two arrays reference the same memory buffer?

My first impression was to use a.data is b.data, but that returns false. I can do a.data == b.data which returns True, but I don't think that checks to make sure a and b share the same memory buffer, only that the block of memory referenced by a and the one referenced by b have the same bytes.

928

asked Jul 02 '12 01:07

mgilson

2 Answers

You can use the base attribute to check if an array shares the memory with another array:

>>> import numpy as np >>> a = np.arange(27) >>> b = a.reshape((3,3,3)) >>> b.base is a True >>> a.base is b False

Not sure if that solves your problem. The base attribute will be None if the array owns its own memory. Note that an array's base will be another array, even if it is a subset:

>>> c = a[2:] >>> c.base is a True

answered Oct 07 '22 11:10

jterrace

I think jterrace's answer is probably the best way to go, but here is another possibility.

def byte_offset(a):     """Returns a 1-d array of the byte offset of every element in `a`.     Note that these will not in general be in order."""     stride_offset = np.ix_(*map(range,a.shape))     element_offset = sum(i*s for i, s in zip(stride_offset,a.strides))     element_offset = np.asarray(element_offset).ravel()     return np.concatenate([element_offset + x for x in range(a.itemsize)])  def share_memory(a, b):     """Returns the number of shared bytes between arrays `a` and `b`."""     a_low, a_high = np.byte_bounds(a)     b_low, b_high = np.byte_bounds(b)      beg, end = max(a_low,b_low), min(a_high,b_high)      if end - beg > 0:         # memory overlaps         amem = a_low + byte_offset(a)         bmem = b_low + byte_offset(b)          return np.intersect1d(amem,bmem).size     else:         return 0

Example:

>>> a = np.arange(10) >>> b = a.reshape((5,2)) >>> c = a[::2] >>> d = a[1::2] >>> e = a[0:1] >>> f = a[0:1] >>> f = f.reshape(()) >>> share_memory(a,b) 80 >>> share_memory(a,c) 40 >>> share_memory(a,d) 40 >>> share_memory(c,d) 0 >>> share_memory(a,e) 8 >>> share_memory(a,f) 8

Here is a plot showing the time for each share_memory(a,a[::2]) call as a function of the number of elements in a on my computer.

share_memory function

answered Oct 07 '22 10:10

user545424

Related questions
                            
                                Iterating through two lists in Django templates
                            
                                How can I repeat each test multiple times in a py.test run?
                            
                                dlib installation on Windows 10
                            
                                How to install pywin32 module in windows 7 [duplicate]
                            
                                Convert an IP string to a number and vice versa
                            
                                Image size (Python, OpenCV)
                            
                                How to install xgboost package in python (windows platform)?
                            
                                Alternative implementations of python/setuptools entry points (extensions) in other languages/applications
                            
                                What does "app.run(host='0.0.0.0') " mean in Flask [duplicate]
                            
                                Uninstall python built from source?
                            
                                OData Python Library available?
                            
                                Python multiprocess profiling
                            
                                Converting a PDF to a series of images with Python
                            
                                Python spawn off a child subprocess, detach, and exit
                            
                                python tilde unary operator as negation numpy bool array
                            
                                difference between command prompt and anaconda prompt
                            
                                Python Multiprocessing Process or Pool for what I am doing?
                            
                                Mako or Jinja2? [closed]
                            
                                TypeError: 'int' object is not subscriptable
                            
                                How do I mock a django signal handler?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With