NumPy Array Copy-On-Write

Question

I have a class that returns large NumPy arrays. These arrays are cached within the class. I would like the returned arrays to be copy-on-write arrays. If the caller ends up just reading from the array, no copy is ever made. This will case no extra memory will be used. However, the array is "modifiable", but does not modify the internal cached arrays.

My solution at the moment is to make any cached arrays readonly (a.flags.writeable = False). This means that if the caller of the function may have to make their own copy of the array if they want to modify it. Of course, if the source was not from cache and the array was already writable, then they would duplicate the data unnecessarily.

So, optimally I would love something like a.view(flag=copy_on_write). There seems to be a flag for the reverse of this UPDATEIFCOPY which causes a copy to update the original once deallocated.

Thanks!

maxy · Accepted Answer

Copy-on-write is a nice concept, but explicit copying seems to be "the NumPy philosophy". So personally I would keep the "readonly" solution if it isn't too clumsy.

But I admit having written my own copy-on-write wrapper class. I don't try to detect write access to the array. Instead the class has a method "get_array(readonly)" returning its (otherwise private) numpy array. The first time you call it with "readonly=False" it makes a copy. This is very explicit, easy to read and quickly understood.

If your copy-on-write numpy array looks like a classical numpy array, the reader of your code (possibly you in 2 years) may have a hard time.

HYRY · Answer

To implement copy on write, we need to modify base, data, strides of ndarray object. I think this can't be done in pure Python code. I use some Cython code to modify these attributes.

Here is the code in IPython notebook:

%load_ext cythonmagic

use Cython define copy_view():

%%cython
cimport numpy as np

np.import_array()
np.import_ufunc()

def copy_view(np.ndarray a):
    cdef np.ndarray b
    cdef object base
    cdef int i
    base = np.get_array_base(a)
    if base is None or isinstance(base, a.__class__):
        return a
    else:
        print "copy"
        b = a.copy()
        np.set_array_base(a, b)
        a.data = b.data
        for i in range(b.ndim):
            a.strides[i] = b.strides[i]

define a subclass of ndarray:

class cowarray(np.ndarray):
    def __setitem__(self, key, value):
        copy_view(self)
        np.ndarray.__setitem__(self, key, value)

    def __array_prepare__(self, array, context=None):
        if self is array:
            copy_view(self)
        return array

    def __array__(self):
        copy_view(self)
        return self

some test:

a = np.array([1.0, 2, 3, 4])
b = a.view(cowarray)
b[1] = 100 #copy 
print a, b
b[2] = 200 #no copy
print a, b

c = a[::2].view(cowarray)
c[0] = 1000 #copy
print a, c

d = a.view(cowarray)
np.sin(d, d) #copy
print a, d

the output:

copy
[ 1.  2.  3.  4.] [   1.  100.    3.    4.]
[ 1.  2.  3.  4.] [   1.  100.  200.    4.]
copy
[ 1.  2.  3.  4.] [ 1000.     3.]
copy
[ 1.  2.  3.  4.] [ 0.84147098  0.90929743  0.14112001 -0.7568025 ]

NumPy Array Copy-On-Write

Tags:

python

numpy

copy-on-write

coderforlife

2 Answers

maxy

HYRY

Recent Activity

Donate For Us

NumPy Array Copy-On-Write

Tags:

python

numpy

copy-on-write

coderforlife

2 Answers

maxy

HYRY

Related questions

Recent Activity

Donate For Us