Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Merging two arbitrary data structures

Tags:

python

I am looking to efficiently merge two (fairly arbitrary) data structures: one representing a set of defaults values and one representing overrides. Example data below. (Naively iterating over the structures works, but is very slow.) Thoughts on the best approach for handling this case?


_DEFAULT   = { 'A': 1122, 'B': 1133, 'C': [ 9988, {            'E': [ { 'F': 6666,            },       ], }, ], }

_OVERRIDE1 = {            'B': 1234, 'C': [ 9876, { 'D': 2345, 'E': [ { 'F': 6789, 'G': 9876, }, 1357, ], }, ], }
_ANSWER1   = { 'A': 1122, 'B': 1234, 'C': [ 9876, { 'D': 2345, 'E': [ { 'F': 6789, 'G': 9876, }, 1357, ], }, ], }

_OVERRIDE2 = {                       'C': [ 6543, {            'E': [ {            'G': 9876, },       ], }, ], }
_ANSWER2   = { 'A': 1122, 'B': 1133, 'C': [ 6543, {            'E': [ { 'F': 6666, 'G': 9876, },       ], }, ], }

_OVERRIDE3 = {            'B': 3456, 'C': [ 1357, { 'D': 4567, 'E': [ { 'F': 6677, 'G': 9876, }, 2468, ], }, ], }
_ANSWER3   = { 'A': 1122, 'B': 3456, 'C': [ 1357, { 'D': 4567, 'E': [ { 'F': 6677, 'G': 9876, }, 2468, ], }, ], }

This is an example of how to run the tests: (The dictionary update doesn't work, just an stub function.)


    import itertools

    def mergeStuff( default, override ):
        # This doesn't work
        result = dict( default )
        result.update( override )
        return result

    def main():
        for override, answer in itertools.izip( _OVERRIDES, _ANSWERS ):
           result = mergeStuff(_DEFAULT, override)
           print('ANSWER:   %s'   % (answer) )
           print('RESULT:   %s\n' % (result) )

like image 640
Colin Edwards Avatar asked Sep 05 '25 02:09

Colin Edwards


2 Answers

You cannot do that by "iterating", you'll need a recursive routine like this:

def merge(a, b):
    if isinstance(a, dict) and isinstance(b, dict):
        d = dict(a)
        d.update({k: merge(a.get(k, None), b[k]) for k in b})
        return d

    if isinstance(a, list) and isinstance(b, list):
        return [merge(x, y) for x, y in itertools.izip_longest(a, b)]

    return a if b is None else b
like image 74
georg Avatar answered Sep 06 '25 22:09

georg


If you want your code to be fast, don't copy like crazy

You don't really need to merge two dicts. You can just chain them.

A ChainMap class is provided for quickly linking a number of mappings so they can be treated as a single unit. It is often much faster than creating a new dictionary and running multiple update() calls.

class ChainMap(UserDict.DictMixin):
"""Combine multiple mappings for sequential lookup"""

    def __init__(self, *maps):
        self._maps = maps

    def __getitem__(self, key):
        for mapping in self._maps:
            try:
                return mapping[key]
            except KeyError:
                pass
        raise KeyError(key)

def main():
    for override, answer in itertools.izip( _OVERRIDES, _ANSWERS ):
       result = ChainMap(override, _DEFAULT)
  • http://docs.python.org/dev/library/collections#chainmap-objects
  • http://code.activestate.com/recipes/305268/
like image 28
user278064 Avatar answered Sep 06 '25 20:09

user278064