Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python copy-on-write behavior

I'm working on a problem where I'm instantiating many instances of an object. Most of the time the instantiated objects are identical. To reduce memory overhead, I'd like to have all the identical objects point to the same address. When I modify the object, though, I'd like a new instance to be created--essentially copy-on-write behavior. What is the best way to achieve this in Python?

The Flyweight Pattern comes close. An example (from http://codesnipers.com/?q=python-flyweights):

import weakref

class Card(object):
    _CardPool = weakref.WeakValueDictionary()
    def __new__(cls, value, suit):
        obj = Card._CardPool.get(value + suit, None)
        if not obj:
            obj = object.__new__(cls)
            Card._CardPool[value + suit] = obj
            obj.value, obj.suit = value, suit
        return obj

This behaves as follows:

>>> c1 = Card('10', 'd')
>>> c2 = Card('10', 'd')
>>> id(c1) == id(c2)
True
>>> c2.suit = 's'
>>> c1.suit
's'
>>> id(c1) == id(c2)
True

The desired behavior would be:

>>> c1 = Card('10', 'd')
>>> c2 = Card('10', 'd')
>>> id(c1) == id(c2)
True
>>> c2.suit = 's'
>>> c1.suit
'd'
>>> id(c1) == id(c2)
False

Update: I came across the Flyweight Pattern and it seemed to almost fit the bill. However, I'm open to other approaches.

like image 294
DMack Avatar asked Sep 10 '12 20:09

DMack


People also ask

Does Python use copy on write?

Python has support for shallow copying and deep copying functionality via its copy module. However it does not provide for copy-on-write semantics.

Why do we need copy () in Python?

For copying mutable objects like lists or dictionaries, we use copy() method. When invoked on any object, the copy() method creates a new object with the same data as the original object and returns a reference to it.

Does Python create a copy or reference?

In Python, Assignment statements do not copy objects, they create bindings between a target and an object. When we use the = operator, It only creates a new variable that shares the reference of the original object.


1 Answers

Do you need id(c1)==id(c2) to be identical, or is that just a demonstration, where the real objective is avoiding creating duplicated objects?

One approach would be to have each object be distinct, but hold an internal reference to the 'real' object like you have above. Then, on any __setattr__ call, change the internal reference.

I've never done __setattr__ stuff before, but I think it would look like this:

class MyObj:
    def __init__(self, value, suit):
        self._internal = Card(value, suit)

    def __setattr__(self, name, new_value):
        if name == 'suit':
            self._internal = Card(value, new_value)
        else:
            self._internal = Card(new_value, suit)

And similarly, expose the attributes through getattr.

You'd still have lots of duplicated objects, but only one copy of the 'real' backing object behind them. So this would help if each object is massive, and wouldn't help if they are lightweight, but you have millions of them.

like image 116
Brenden Brown Avatar answered Oct 24 '22 19:10

Brenden Brown