Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two lists of dicts of different lengths using a single key in Python

I want to merge two lists of dictionaries on a single key, when the two lists are different lengths (using Python 3.6). For example, if we have a list of dicts called l1:

l1 = [{'pcd_sector': 'ABDC', 'coverage_2014': '100'},
       {'pcd_sector': 'DEFG', 'coverage_2014': '0'}]

and another list of dicts called l2:

l2 = [{'pcd_sector': 'ABDC', 'asset': '3G', 'asset_id': '2gs'},
      {'pcd_sector': 'ABDC', 'asset': '4G', 'asset_id': '7jd'},
      {'pcd_sector': 'DEFG', 'asset': '3G', 'asset_id': '3je'},
      {'pcd_sector': 'DEFG', 'asset': '4G', 'asset_id': '8js'},
      {'pcd_sector': 'CDEF', 'asset': '3G', 'asset_id': '4jd'}]

How would one merge them using pcd_sector to get this(?):

result = [{'pcd_sector': 'ABDC', 'asset': '3G', 'asset_id': '2gs', 'coverage_2014': '100'},
          {'pcd_sector': 'ABDC', 'asset': '4G', 'asset_id': '7jd', 'coverage_2014': '100'},
          {'pcd_sector': 'DEFG', 'asset': '3G', 'asset_id': '3je', 'coverage_2014': '0'},
          {'pcd_sector': 'DEFG', 'asset': '4G', 'asset_id': '8js', 'coverage_2014': '0'},
          {'pcd_sector': 'CDEF', 'asset': '3G', 'asset_id': '4jd'}]

What I have tried so far

I've used the following code to merge the two lists, but I end up with a short version unfortunately, not the desired complete data structure.

import pprint
grouped = {}
for d in l1 + l2:
    grouped.setdefault(d['pcd_sector'], {'asset':0, 'asset_id':0, 'coverage_2014':0}).update(d)
result = [d for d in grouped.values()]
pprint.pprint(result)

So when I run the code, I end up with this short output:

result = [{'pcd_sector': 'ABDC', 'asset': '3G', 'asset_id': '2gs', 'coverage_2014': '100'},
         {'pcd_sector': 'DEFG', 'asset': '4G', 'asset_id': '8js', 'coverage_2014': '0'},
         {'pcd_sector': 'CDEF', 'asset': '3G', 'asset_id': '4jd'}]
like image 409
Thirst for Knowledge Avatar asked Jul 24 '17 09:07

Thirst for Knowledge


People also ask

How do I merge two Dicts together?

Python 3.9 has introduced the merge operator (|) in the dict class. Using the merge operator, we can combine dictionaries in a single line of code. We can also merge the dictionaries in-place by using the update operator (|=).

How do I merge a list of Dicts into a single dict?

To merge multiple dictionaries, the most Pythonic way is to use dictionary comprehension {k:v for x in l for k,v in x. items()} to first iterate over all dictionaries in the list l and then iterate over all (key, value) pairs in each dictionary.


2 Answers

Problem

The problem in your approach is that your data is put in a grouped dict with 'pcd_sector' as keys but your l2 has multiple dicts with the same 'pcd_sector'. You could use a tuple of 'pcd_sector', 'asset' as key for l2, but it wouldn't work for l1 anymore. So you need to do the processing in two steps instead of iterating on l1 + l2 directly.

Theory

If pcd_sector keys are unique in l1, you can create a big dict instead of a list of small dicts:

>>> d1 = {d['pcd_sector']:d for d in l1}
>>> d1
{'ABDC': {'pcd_sector': 'ABDC', 'coverage_2014': '100'}, 'DEFG': {'pcd_sector': 'DEFG', 'coverage_2014': '0'}}

Then, you simply need to merge the dicts that have the same pcd_sector keys:

>>> [dict(d, **d1.get(d['pcd_sector'], {})) for d in l2]
[{'asset_id': '2gs', 'coverage_2014': '100', 'pcd_sector': 'ABDC', 'asset': '3G'}, {'asset_id': '7jd', 'coverage_2014': '100', 'pcd_sector': 'ABDC', 'asset': '4G'}, {'asset_id': '3je', 'coverage_2014': '0', 'pcd_sector': 'DEFG', 'asset': '3G'}, {'asset_id': '8js', 'coverage_2014': '0', 'pcd_sector': 'DEFG', 'asset': '4G'}, {'asset_id': '4jd', 'pcd_sector': 'CDEF', 'asset': '3G'}]

Complete code

Putting it all together, the code becomes:

l1 = [{'pcd_sector': 'ABDC', 'coverage_2014': '100'},
       {'pcd_sector': 'DEFG', 'coverage_2014': '0'}]

l2 = [{'pcd_sector': 'ABDC', 'asset': '3G', 'asset_id': '2gs'},
      {'pcd_sector': 'ABDC', 'asset': '4G', 'asset_id': '7jd'},
      {'pcd_sector': 'DEFG', 'asset': '3G', 'asset_id': '3je'},
      {'pcd_sector': 'DEFG', 'asset': '4G', 'asset_id': '8js'},
      {'pcd_sector': 'CDEF', 'asset': '3G', 'asset_id': '4jd'}]

d1 = {d['pcd_sector']:d for d in l1}
result = [dict(d, **d1.get(d['pcd_sector'], {})) for d in l2]

import pprint
pprint.pprint(result)
#   [{'asset': '3G',
#     'asset_id': '2gs',
#     'coverage_2014': '100',
#     'pcd_sector': 'ABDC'},
#    {'asset': '4G',
#     'asset_id': '7jd',
#     'coverage_2014': '100',
#     'pcd_sector': 'ABDC'},
#    {'asset': '3G',
#     'asset_id': '3je',
#     'coverage_2014': '0',
#     'pcd_sector': 'DEFG'},
#    {'asset': '4G',
#     'asset_id': '8js',
#     'coverage_2014': '0',
#     'pcd_sector': 'DEFG'},
#    {'asset': '3G', 'asset_id': '4jd', 'pcd_sector': 'CDEF'}]
like image 111
Eric Duminil Avatar answered Oct 07 '22 01:10

Eric Duminil


You can create a lookup dictionary based on pcd_sector and just update your original list of dicts based on that:

>>> import copy
>>> lookup = { x['pcd_sector'] : x for x in l1 }
>>> result = copy.deepcopy(l2)
>>> for d in result:
...     d.update(lookup.get(d['pcd_sector'], {})) # golfed courtesy Ashwini Chaudhary
... 
>>> result
[{'pcd_sector': 'ABDC', 'asset': '3G', 'asset_id': '2gs', 'coverage_2014': '100'}, 
{'pcd_sector': 'ABDC', 'asset': '4G', 'asset_id': '7jd', 'coverage_2014': '100'}, 
{'pcd_sector': 'DEFG', 'asset': '3G', 'asset_id': '3je', 'coverage_2014': '0'}, 
{'pcd_sector': 'DEFG', 'asset': '4G', 'asset_id': '8js', 'coverage_2014': '0'},
{'pcd_sector': 'CDEF', 'asset': '3G', 'asset_id': '4jd'}]
like image 21
cs95 Avatar answered Oct 07 '22 01:10

cs95