Python fold/reduce composition of multiple dictionaries

Tags:

I want to achieve the following. It's essentially the composition or merging of an arbitrary number of dictionaries, with reference to a 'seed' or root dictionary, accumulating all unchanged and updated values in the final result.

seed = {
    'update': False,
    'data': {
        'subdata': {
            'field1': 5,
            'field2': '2018-01-30 00:00:00'
        },
        'field3': 2,
        'field4': None
    },
    'data_updates': {},
    'subdata_updates': {},
    'diffs': {}
}

update_1 = {
    'update': True,
    'data': {
        'subdata': {
            'field1': 6,
            'field2': '2018-01-30 00:00:00'
        },
        'field3': 2,
        'field4': None
    },
    'data_updates': {},
    'subdata_updates': {'field1': 6},
    'diffs': {
        'field1': {
            'field': 'field1',
            'before': 5,
            'after': 6
        }
    }
}

update_2 = {
    'update': True,
    'data': {
        'subdata': {
            'field1': 5,
            'field2': '2018-01-30 00:00:00',
        },
        'field3': 2,
        'field4': 1
    },
    'data_updates': {'field4': 1},
    'subdata_updates': {},
    'diffs': {
        'field4': {
            'field': 'field4',
            'before': None,
            'after': 1
        }
    }
}

# I want to be able to pass in an arbitrary number of updates.
assert reduce_maps(seed, *[update_1, update_2]) == {
    'update': True,
    'data': {
        'subdata': {
            'field1': 6,
            'field2': '2018-01-30 00:00:00',
        },
        'field3': 2,
        'field4': 1
    },
    'data_updates': {'field4': 1},
    'subdata_updates': {'field1': 6},
    'diffs': {
        'field1': {
            'field': 'field1',
            'before': 5,
            'after': 6
        },
        'field4': {
            'field': 'field4',
            'before': None,
            'after': 1
        }
    }
}

You can assume the data will always be in this shape, you can also assume that each payload only ever updates one field and that no two updates will ever update the same field.

I can dimly perceive an analogue of fold lurking in the background here building up the data in passes around seed.

588

asked Jun 28 '18 14:06

jhrr

1 Answers

Here you go:

from pprint import pprint


def merge_working(pre, post):
    if not (isinstance(pre, dict) and isinstance(post, dict)):
        return post

    new = pre.copy()  # values for unique keys of pre will be preserved
    for key, post_value in post.items():
        new[key] = merge_working(new.get(key), post_value)

    return new


def merge_simplest(pre, post):
    if not isinstance(pre, dict):
        return post
    return {key: merge_simplest(pre[key], post[key])
            for key in pre}


merge = merge_working


def reduce_maps(*objects):
    new = objects[0]
    for post in objects[1:]:
        new = merge(new, post)
    return new


seed = {
    'update': False,
    'data': {
        'subdata': {
            'field1': 5,
            'field2': '2018-01-30 00:00:00'
        },
        'field3': 2,
        'field4': None
    },
    'data_updates': {},
    'subdata_updates': {},
    'diffs': {}
}

update_1 = {
    'update': True,
    'data': {
        'subdata': {
            'field1': 6,
            'field2': '2018-01-30 00:00:00'
        },
        'field3': 2,
        'field4': None
    },
    'data_updates': {},
    'subdata_updates': {'field1': 6},
    'diffs': {
        'field1': {
            'field': 'field 1',
            'before': 5,
            'after': 6
        }
    }
}

update_2 = {
    'update': True,
    'data': {
        'subdata': {
            'field1': 5,
            'field2': '2018-01-30 00:00:00',
        },
        'field3': 2,
        'field4': 1
    },
    'data_updates': {'field4': 1},
    'subdata_updates': {},  # was subdata_update
    'diffs': {
        'field4': {
            'field': 'field 4',
            'before': None,
            'after': 1
        }
    }
}

result = reduce_maps(*[seed, update_1, update_2])

golden = {
    'update': True,
    'data': {
        'subdata': {
            'field1': 5,  # was 6
            'field2': '2018-01-30 00:00:00',
        },
        'field3': 2,
        'field4': 1
    },
    'data_updates': {'field4': 1},
    'subdata_updates': {'field1': 6},  # was subdata_update
    'diffs': {
        'field1': {
            'field': 'field 1',
            'before': 5,
            'after': 6
        },
        'field4': {
            'field': 'field 4',
            'before': None,
            'after': 1
        }
    }
}

pprint(result)
pprint(golden)

assert result == golden

I've fixed what I think were typos in your data (see comments in the code).

Note that merge may need tweaking according to exact merging rules and possible data. To see what I mean, use merge = merge_simplest and understand why it fails. It wouldn't if the "data-agnostic" shape (understood as the dictionary tree disregarding values of leaves) were really the same.

148

answered Sep 28 '22 06:09

Kirill Bulygin

Related questions
                            
                                Pandas alternative to apply - to create new column based on multiple columns
                            
                                R_ext/eventloop.h: No such file error while installing rpy2 using pip
                            
                                Flask: serve assets without leading slash using url_for
                            
                                Django on GAE - How to automatically 'migrate' on deploy?
                            
                                Error trying to use cvtColor with cv2.COLOR_YUV2BGR_Y422 - error: (-215) scn == 2 && depth == 0 in function cv::cvtColor
                            
                                How can I manage a queue of requests in my Flask service?
                            
                                Polymorphism and pybind11
                            
                                Is there a “breadth-first” search option available in os.walk() or equivalent Python function?
                            
                                Replicate curl negotiate connection (using kerberos auth) in Python
                            
                                Unable to use multiple proxies within Scrapy spider
                            
                                ImportError: cannot import name 'string_int_label_map_pb2'
                            
                                CFFI fails in Python (Linux) virtual environment -- attempting to install cryptography package in venv
                            
                                DeviceCheck: Unable to verify authorization token
                            
                                "ValueError: Trying to share variable $var, but specified dtype float32 and found dtype float64_ref" when trying to use get_variable
                            
                                Why can't I access builtins if I use a custom dict as a function's globals?
                            
                                pandas rolling() function with monthly offset
                            
                                Reading pandas dataframe that contains dictionaries in cells from csv
                            
                                Should I capitalize constant in Python?
                            
                                What are Queue classes, Worker Classes, Job Classes in Python rq package
                            
                                resolving package resolutions in conda

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python fold/reduce composition of multiple dictionaries

Tags:

python

dictionary

reduce

fold

jhrr

People also ask

1 Answers

Kirill Bulygin

Recent Activity

Donate For Us