Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Merge 2 or more Dicts using a value to handle duplicate keys

I am merging dictionaries that have some duplicate keys. The values will be different and I want to ignore the lower value record.

dict1 = {1 :["in",1], 2 :["out",1], 3 :["in",1]}
dict2 = {1 :["out",2], 2 :["out",1]}

If the keys are equal I want the key[0][1] with the greatest value to be in the new dict. The output of merging these 2 dicts should be:

dict3 = {1 :["out",2], 2 :["out",1], 3 :["in",1]}

The only way I know to solve this is to run a loop with a condition to determine which one to add into the merged dict. Is there a more pythonic way of doing it?

The duplicate keys will be very few and far between, less than 1% if that will make any difference to the end solution.

like image 910
fairywings78 Avatar asked Feb 12 '23 09:02

fairywings78


1 Answers

A pythonic solution should rely heavily on the python standard library and the available syntactic constructs. Not only to simplify the code, but also to gain performance.

In your case you can benefit on the fact that only 1% of the keys occur in both dictionaries:

 conflictKeys = set(dict1) & set(dict2)      # get all keys, that are in both dictionaries
 solvedConflicts = { key: dict1[key] 
                          if dict1[key][1] > dict2[key][1] 
                          else dict2[key] 
                     for key in conflictKeys }  # dictionary with conflict keys only and their wanted value

 result = dict1.copy()                       # add values unique to dict1 to result
 result.update(dict2)                        # add values unique to dict2 to result
 result.update(solvedConflicts)              # add values occuring in both dicts to result

This solution will try to avoid to run the "slow" python interpreter for every key of the two dictionaries, but will use fast python library routines (which are written in C). That is:

  • dict.update() to merge both dictionaries
  • set.intersection()(synonym for set1 & set2) to get all conflicts

Only for solving the conflicting keys you need the python interpreter to loop through all entries. But even here you can profit of the pythonic construct "list comprehenion" in terms of performance (Compared to an imperative for loop). This is due to the fact, that the memory for solvedConflicts could be allocated at once without any reallocations. A imperative for loop would need to increase the resulting solvedConflicts element by element instead and this need a lot of memory reallocations.

like image 120
mrh1997 Avatar answered Feb 16 '23 04:02

mrh1997