Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Combining unique values in list of dicts where keys are the same?

I'm not sure if I am asking the question in the right way, but this is my issue:

I have a list of dicts in the following format:

[
{'user': 'joe', 'IndexUsed': 'a'}, 
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'b'}, 
{'user': 'admin', 'IndexUsed': 'a'}, 
{'user': 'admin', 'IndexUsed': 'c'},
{'user': 'hugo', 'IndexUsed': 'a'},
{'user': 'hugo', 'IndexUsed': 'd'},
...
]

I want my final result to look like this:

[
{'user': 'joe', 'IndexUsed': ['a', 'b']}, 
{'user': 'admin', 'IndexUsed': ['a', 'c']}, 
{'user': 'hugo', 'IndexUsed': ['a', 'd']},
]

In essence, combining/deduplicating the unique fields in IndexUsed and reducing them to only one dict per user

I have looked into using reducers, dict comprehension, and searched on StackOverflow but I have some trouble finding use cases using strings. The majority of examples I have found are using integers to combine them into a final int/float, but here I rather want to combine it into a single final string. Could you help me understand how to approach this problem?

like image 996
SynchronDEV Avatar asked Jan 12 '21 14:01

SynchronDEV


People also ask

Can Dicts have same keys?

The straight answer is NO. You can not have duplicate keys in a dictionary in Python.

Can you concatenate Dicts in Python?

In the latest update of python now we can use “|” operator to merge two dictionaries. It is a very convenient method to merge dictionaries.

Can same key have different values?

You can't. Keys have to be unique.

How to get a list of unique values in Python?

There are a few ways to get a list of unique values in Python. This article will show you how. Using a set one way to go about it. A set is useful because it contains unique elements. You can use a set to get the unique elements. Then, turn the set into a list.

Why is it important to combine dictionaries in Python?

Because of the important of retrieving web data, being able to combine dictionaries in Python is an important skill to understand. Python dictionaries use a key:value mapping to store data. Keys must be unique and must be immutable objects (such as strings or tuples).

What is the set of unique keys in Python?

Explanation: The set of unique keys are {“X”, “Y”, “Z”}. Approach using Chain iterable tools: The problem can be solved using set () and keys () methods and chain iterable tools to solve the above problem.

How to unpack and merge two dictionaries in Python?

Items in Python can be unpacked using either the * or the ** characters. For dictionaries, to access both the key and value, you need to use the ** characters. Let’s see how we can use this to merge two dictionaries in Python:


2 Answers

from collections import defaultdict


data = [{'IndexUsed': 'a', 'user': 'joe'},
 {'IndexUsed': 'a', 'user': 'joe'},
 {'IndexUsed': 'a', 'user': 'joe'},
 {'IndexUsed': 'b', 'user': 'joe'},
 {'IndexUsed': 'a', 'user': 'admin'},
 {'IndexUsed': 'c', 'user': 'admin'},
 {'IndexUsed': 'a', 'user': 'hugo'},
 {'IndexUsed': 'd', 'user': 'hugo'}]

indexes_used = defaultdict(set)
for d in data:
    indexes_used[d['user']].add(d['IndexUsed'])

result = []
for k, v in indexes_used.items():
    result.append({'user': k, 'IndexUsed': sorted(list(v))})

print(*result)

Outputs:

{'user': 'joe', 'IndexUsed': ['a', 'b']} {'user': 'admin', 'IndexUsed': ['a', 'c']} {'user': 'hugo', 'IndexUsed': ['a', 'd']}

Note: for the unaware, defaultdict uses the passed function (set in this case) as a factory to create the new missing key corresponding value. So every single key of indexes_used is set to a set filled with the used indexes. Using a set also ignores duplicates. In the end the set is converted to a sorted list, while creating the required key IndexUsed.

like image 82
progmatico Avatar answered Oct 26 '22 23:10

progmatico


If the dictionaries are guaranteed to be grouped together by name, then you could use itertools.groupby to process each group of dictionaries separately:

from itertools import groupby
from operator import itemgetter

data = [
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'b'},
    {'user': 'admin', 'IndexUsed': 'a'},
    {'user': 'admin', 'IndexUsed': 'c'},
    {'user': 'hugo', 'IndexUsed': 'a'},
    {'user': 'hugo', 'IndexUsed': 'd'},
]

merged_data = [{"user": key, "IndexUsed": list({i: None for i in map(itemgetter("IndexUsed"), group)})} for key, group in groupby(data, key=itemgetter("user"))]
for d in merged_data:
    print(d)

Output:

{'user': 'joe', 'IndexUsed': ['a', 'b']}
{'user': 'admin', 'IndexUsed': ['a', 'c']}
{'user': 'hugo', 'IndexUsed': ['a', 'd']}
>>> 

This was just the first thing I came up with, but I don't like it for several reasons. First, like I said, it assumes that the original dictionaries are grouped together by the key user. In addition, long list-comprehensions are not readable and should be avoided. The way in which the merged IndexUsed list is generated is by creating a temporary dictionary which maps unique entries to None (ew, gross - a dictionary is used rather than a set, because sets don't preserve insertion order). It also assumes you're using a certain version of Python 3.x+, where dictionaries are guaranteed to preserve insertion order (you could be more explicit by using collections.OrderedDict, but that's one more import). Finally, you shouldn't have to hardcode the "user" and "IndexUsed" key-literals. Someone please suggest a better answer.

like image 45
Paul M. Avatar answered Oct 27 '22 00:10

Paul M.