Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python dictionary that only stores changes

I have written a code for design optimization using the Inspyred library and its implementation of Genetic Algorithms. In essence, the optimization process creates a large number of variations on a single data structure, which is a nested dictionary in my case.

In order to reduce the amount of memory used in the process, I have been trying to create some kind of differential dictionary type, which only stores items which are different from the base dictionary. The reason for this is that in a typical case, 95% of the data in the data structure will not be modified in any of the variations, but any part of the data structure can contain variations. So for flexibility reasons, I would like to have a data type that behaves more or less like a dictionary, but only stores the changes.

This is the result of my attempt to create this:

#!/usr/bin/python

import unittest
import copy

global_base={}

class DifferentialDict(object):
    """
    dictionary with differential storage of changes
    all DifferentialDict objects have the same base dictionary
    """

    def __init__(self,base=None):
        global global_base

        self.changes={}

        if not base==None:
            self.set_base(base)

    def set_base(self,base):
        global global_base
        global_base=copy.deepcopy(base)

    def __copy__(self):
        return self

    def __deepcopy__(self):
        new=DifferentialDict()
        new.changes=copy.deepcopy(self.changes)
        return new

    def get(self):
        global global_base
        outdict=copy.deepcopy(global_base)
        for key in self.changes:
            outdict[key]=self.changes[key]
        return outdict

    def __setitem__(self,key,value):
        self.changes[key]=value

    def __getitem__(self,key):
        global global_base
        if key in self.changes:
            return self.changes[key]
        else:
            return global_base[key]

class TestDifferentialDict(unittest.TestCase):
    def test1(self):
        ldict={'a':{1:2,3:4},'b':{'c':[1,2,3],'d':'abc'}}
        ddict=DifferentialDict(base=ldict)

        self.assertEqual(ddict['a'],{1:2,3:4})
        ddict['a']=5
        self.assertEqual(ddict['a'],5)

    def test2(self):
        ldict={'a':{1:2,3:4},'b':{'c':[1,2,3],'d':'abc'}}
        ddict1=DifferentialDict(base=ldict)
        ddict2=DifferentialDict(base=ldict)

        ddict1['a'][3]=5
        ddict2['a'][3]=7
        self.assertEqual(ddict1['a'][3],5)
        self.assertEqual(ddict2['a'][3],7)

    def test3(self):
        ldict={'a':{1:2,3:4},'b':{'c':[1,2,3],'d':'abc'}}
        ddict1=DifferentialDict(base=ldict)
        ddict2=ddict1.__deepcopy__()

        ddict1['a'][3]=5
        ddict2['a'][3]=7
        self.assertEqual(ddict1['a'][3],5)
        self.assertEqual(ddict2['a'][3],7)



if __name__ == "__main__":
    unittest.main()

It works fine for a simple dictionary, but breaks down when new dictionaries are nested within the main dictionary. I understand that this happens because these second level dictionaries are real Python dictionaries instead of instantiations of my DifferentialDict, resulting in an overwriting of entries in global_base rather than changes in self.changes. However, they have to be because of the premise that all DifferentialDict instantiations share the same base dictionary. I could add an 'entry level' key to each DifferentialDict instantiation, but my feeling is that there is a more elegant solution which eludes me.

I would really appreciate any suggestions on how to get my differential dictionary working when nested. Thanks in advance!

like image 869
Asellus aquaticus Avatar asked Oct 18 '22 14:10

Asellus aquaticus


1 Answers

I don't have the time to try this right now (maybe a bit later), but here are two observations:

combined indexes

If you would use tuples as indexes, for example like this dict[(5,3,2)] you would not have this problem. If you base either your base dict or also the differential dicts on this, you could circumvent the problem.

Maybe you could even write some classes that rewrite dict[a][b][c]to dict[(a,b,c)] to make this internal change transparent.

global base

I don't understand why you use a global base. From my point of view this makes the code more complicated without adding anything. Why don't you just store the base as in:

def MyDict(collections.abc.MutableSequence):
    def __init__(self, base):
        self._base = base

my_global_base = dict()
d = MyDict(my_global_base)

d[2] = 'abc' # modifies self._base inside of the instance too, because it is the
             # same object

If you want to change all the content of the base, just delete all items using popitem() and then add new ones using update(). This way your code is more flexible and does not have any surprising behavior because of the global variables.

abstract base classes

When reimplementing classes like sequences, dictionaries etc. it might come in handy to use the abstract base classes provided by Python, they do some of the implementation work for you.

like image 109
Georg Schölly Avatar answered Nov 15 '22 06:11

Georg Schölly