Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ordered dictionary of ordered dictionaries in python

I need a dictionary data structure that store dictionaries as seen below:

custom = {1: {'a': np.zeros(10), 'b': np.zeros(100)}, 
          2: {'c': np.zeros(20), 'd': np.zeros(200)}}

But the problem is that I iterate over this data structure many times in my code. Every time I iterate over it, I need the order of iteration to be respected because all the elements in this complex data structure are mapped to a 1D array (serialized if you will), and thus the order is important. I thought about writing a ordered dict of ordered dict for that matter, but I'm not sure this is the right solution as it seems I may be choosing the wrong data structure. What would be the most adequate solution for my case?

UPDATE

So this is what I came up with so far:

class Test(list):

    def __init__(self, *args, **kwargs):

        super(Test, self).__init__(*args, **kwargs)

        for k,v in args[0].items():
            self[k] = OrderedDict(v)

        self.d = -1
        self.iterator = iter(self[-1].keys())
        self.etype = next(self.iterator)
        self.idx = 0


    def __iter__(self):
        return self

    def __next__(self):

        try:
            self.idx += 1
            return self[self.d][self.etype][self.idx-1]

        except IndexError:

            self.etype = next(self.iterator)
            self.idx = 0
            return self[self.d][self.etype][self.idx-1]

    def __call__(self, d):

        self.d = -1 - d
        self.iterator = iter(self[self.d].keys())
        self.etype = next(self.iterator)
        self.idx = 0
        return self


def main(argv=()):

    tst = Test(elements)
    for el in tst:
        print(el)
    # loop over a lower dimension
    for el in tst(-2):
        print(el)

    print(tst)


    return 0

if __name__ == "__main__":
    sys.exit(main())

I can iterate as many times as I want in this ordered structure, and I implemented __call__ so I can iterate over the lower dimensions. I don't like the fact that if there isn't a lower dimension present in the list, it doesn't give me any errors. I also have the feeling that every time I call return self[self.d][self.etype][self.idx-1] is less efficient than the original iteration over the dictionary. Is this true? How can I improve this?

like image 930
aaragon Avatar asked Dec 15 '15 16:12

aaragon


2 Answers

I think using OrderedDicts is the best way. They're built-in and relatively fast:

custom = OrderedDict([(1, OrderedDict([('a', np.zeros(10)),
                                       ('b', np.zeros(100))])),
                      (2, OrderedDict([('c', np.zeros(20)),
                                       ('d', np.zeros(200))]))])

If you want to make it easy to iterate over the contents of the your data structure, you can always provide a utility function to do so:

def iter_over_contents(data_structure):
    for delem in data_structure.values():
        for v in delem.values():
            for row in v:
                yield row

Note that in Python 3.3+, which allows yield from <expression>, the last for loop can be eliminated:

def iter_over_contents(data_structure):
    for delem in data_structure.values():
        for v in delem.values():
            yield from v

With one of those you'll then be able to write something like:

for elem in iter_over_contents(custom):
    print(elem)

and hide the complexity.

While you could define your own class in an attempt to encapsulate this data structure and use something like the iter_over_contents() generator function as its __iter__() method, that approach would likely be slower and wouldn't allow expressions using two levels of indexing such this following:

custom[1]['b']

which using nested dictionaries (or OrderedDefaultdicts as shown in my other answer) would.

like image 171
martineau Avatar answered Oct 23 '22 04:10

martineau


Could you just use a list of dictionaries?

custom = [{'a': np.zeros(10), 'b': np.zeros(100)},
          {'c': np.zeros(20), 'd': np.zeros(200)}]

This could work if the outer dictionary is the only one you need in the right order. You could still access the inner dictionaries with custom[0] or custom[1] (careful, indexing now starts at 0).

If not all of the indices are used, you could do the following:

custom = [None] * maxLength   # maximum dict size you expect

custom[1] = {'a': np.zeros(10), 'b': np.zeros(100)}
custom[2] = {'c': np.zeros(20), 'd': np.zeros(200)}
like image 32
Lisa Avatar answered Oct 23 '22 04:10

Lisa