I have written a code for design optimization using the Inspyred library and its implementation of Genetic Algorithms. In essence, the optimization process creates a large number of variations on a single data structure, which is a nested dictionary in my case.
In order to reduce the amount of memory used in the process, I have been trying to create some kind of differential dictionary type, which only stores items which are different from the base dictionary. The reason for this is that in a typical case, 95% of the data in the data structure will not be modified in any of the variations, but any part of the data structure can contain variations. So for flexibility reasons, I would like to have a data type that behaves more or less like a dictionary, but only stores the changes.
This is the result of my attempt to create this:
#!/usr/bin/python
import unittest
import copy
global_base={}
class DifferentialDict(object):
"""
dictionary with differential storage of changes
all DifferentialDict objects have the same base dictionary
"""
def __init__(self,base=None):
global global_base
self.changes={}
if not base==None:
self.set_base(base)
def set_base(self,base):
global global_base
global_base=copy.deepcopy(base)
def __copy__(self):
return self
def __deepcopy__(self):
new=DifferentialDict()
new.changes=copy.deepcopy(self.changes)
return new
def get(self):
global global_base
outdict=copy.deepcopy(global_base)
for key in self.changes:
outdict[key]=self.changes[key]
return outdict
def __setitem__(self,key,value):
self.changes[key]=value
def __getitem__(self,key):
global global_base
if key in self.changes:
return self.changes[key]
else:
return global_base[key]
class TestDifferentialDict(unittest.TestCase):
def test1(self):
ldict={'a':{1:2,3:4},'b':{'c':[1,2,3],'d':'abc'}}
ddict=DifferentialDict(base=ldict)
self.assertEqual(ddict['a'],{1:2,3:4})
ddict['a']=5
self.assertEqual(ddict['a'],5)
def test2(self):
ldict={'a':{1:2,3:4},'b':{'c':[1,2,3],'d':'abc'}}
ddict1=DifferentialDict(base=ldict)
ddict2=DifferentialDict(base=ldict)
ddict1['a'][3]=5
ddict2['a'][3]=7
self.assertEqual(ddict1['a'][3],5)
self.assertEqual(ddict2['a'][3],7)
def test3(self):
ldict={'a':{1:2,3:4},'b':{'c':[1,2,3],'d':'abc'}}
ddict1=DifferentialDict(base=ldict)
ddict2=ddict1.__deepcopy__()
ddict1['a'][3]=5
ddict2['a'][3]=7
self.assertEqual(ddict1['a'][3],5)
self.assertEqual(ddict2['a'][3],7)
if __name__ == "__main__":
unittest.main()
It works fine for a simple dictionary, but breaks down when new dictionaries are nested within the main dictionary. I understand that this happens because these second level dictionaries are real Python dictionaries instead of instantiations of my DifferentialDict, resulting in an overwriting of entries in global_base rather than changes in self.changes. However, they have to be because of the premise that all DifferentialDict instantiations share the same base dictionary. I could add an 'entry level' key to each DifferentialDict instantiation, but my feeling is that there is a more elegant solution which eludes me.
I would really appreciate any suggestions on how to get my differential dictionary working when nested. Thanks in advance!
I don't have the time to try this right now (maybe a bit later), but here are two observations:
If you would use tuples as indexes, for example like this dict[(5,3,2)]
you would not have this problem. If you base either your base dict or also the differential dicts on this, you could circumvent the problem.
Maybe you could even write some classes that rewrite dict[a][b][c]
to dict[(a,b,c)]
to make this internal change transparent.
I don't understand why you use a global base. From my point of view this makes the code more complicated without adding anything. Why don't you just store the base as in:
def MyDict(collections.abc.MutableSequence):
def __init__(self, base):
self._base = base
my_global_base = dict()
d = MyDict(my_global_base)
d[2] = 'abc' # modifies self._base inside of the instance too, because it is the
# same object
If you want to change all the content of the base, just delete all items using popitem()
and then add new ones using update()
. This way your code is more flexible and does not have any surprising behavior because of the global variables.
When reimplementing classes like sequences, dictionaries etc. it might come in handy to use the abstract base classes provided by Python, they do some of the implementation work for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With