I'm not sure if I am asking the question in the right way, but this is my issue:
I have a list of dicts in the following format:
[
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'b'},
{'user': 'admin', 'IndexUsed': 'a'},
{'user': 'admin', 'IndexUsed': 'c'},
{'user': 'hugo', 'IndexUsed': 'a'},
{'user': 'hugo', 'IndexUsed': 'd'},
...
]
I want my final result to look like this:
[
{'user': 'joe', 'IndexUsed': ['a', 'b']},
{'user': 'admin', 'IndexUsed': ['a', 'c']},
{'user': 'hugo', 'IndexUsed': ['a', 'd']},
]
In essence, combining/deduplicating the unique fields in IndexUsed
and reducing them to only one dict per user
I have looked into using reducers, dict comprehension, and searched on StackOverflow but I have some trouble finding use cases using strings. The majority of examples I have found are using integers to combine them into a final int/float, but here I rather want to combine it into a single final string. Could you help me understand how to approach this problem?
The straight answer is NO. You can not have duplicate keys in a dictionary in Python.
In the latest update of python now we can use “|” operator to merge two dictionaries. It is a very convenient method to merge dictionaries.
You can't. Keys have to be unique.
There are a few ways to get a list of unique values in Python. This article will show you how. Using a set one way to go about it. A set is useful because it contains unique elements. You can use a set to get the unique elements. Then, turn the set into a list.
Because of the important of retrieving web data, being able to combine dictionaries in Python is an important skill to understand. Python dictionaries use a key:value mapping to store data. Keys must be unique and must be immutable objects (such as strings or tuples).
Explanation: The set of unique keys are {“X”, “Y”, “Z”}. Approach using Chain iterable tools: The problem can be solved using set () and keys () methods and chain iterable tools to solve the above problem.
Items in Python can be unpacked using either the * or the ** characters. For dictionaries, to access both the key and value, you need to use the ** characters. Let’s see how we can use this to merge two dictionaries in Python:
from collections import defaultdict
data = [{'IndexUsed': 'a', 'user': 'joe'},
{'IndexUsed': 'a', 'user': 'joe'},
{'IndexUsed': 'a', 'user': 'joe'},
{'IndexUsed': 'b', 'user': 'joe'},
{'IndexUsed': 'a', 'user': 'admin'},
{'IndexUsed': 'c', 'user': 'admin'},
{'IndexUsed': 'a', 'user': 'hugo'},
{'IndexUsed': 'd', 'user': 'hugo'}]
indexes_used = defaultdict(set)
for d in data:
indexes_used[d['user']].add(d['IndexUsed'])
result = []
for k, v in indexes_used.items():
result.append({'user': k, 'IndexUsed': sorted(list(v))})
print(*result)
Outputs:
{'user': 'joe', 'IndexUsed': ['a', 'b']} {'user': 'admin', 'IndexUsed': ['a', 'c']} {'user': 'hugo', 'IndexUsed': ['a', 'd']}
Note: for the unaware, defaultdict
uses the passed function (set
in this case) as a factory to create the new missing key corresponding value. So every single key of indexes_used
is set to a set
filled with the used indexes. Using a set
also ignores duplicates. In the end the set
is converted to a sorted list, while creating the required key IndexUsed
.
If the dictionaries are guaranteed to be grouped together by name, then you could use itertools.groupby
to process each group of dictionaries separately:
from itertools import groupby
from operator import itemgetter
data = [
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'b'},
{'user': 'admin', 'IndexUsed': 'a'},
{'user': 'admin', 'IndexUsed': 'c'},
{'user': 'hugo', 'IndexUsed': 'a'},
{'user': 'hugo', 'IndexUsed': 'd'},
]
merged_data = [{"user": key, "IndexUsed": list({i: None for i in map(itemgetter("IndexUsed"), group)})} for key, group in groupby(data, key=itemgetter("user"))]
for d in merged_data:
print(d)
Output:
{'user': 'joe', 'IndexUsed': ['a', 'b']}
{'user': 'admin', 'IndexUsed': ['a', 'c']}
{'user': 'hugo', 'IndexUsed': ['a', 'd']}
>>>
This was just the first thing I came up with, but I don't like it for several reasons. First, like I said, it assumes that the original dictionaries are grouped together by the key user
. In addition, long list-comprehensions are not readable and should be avoided. The way in which the merged IndexUsed
list is generated is by creating a temporary dictionary which maps unique entries to None
(ew, gross - a dictionary is used rather than a set, because sets don't preserve insertion order). It also assumes you're using a certain version of Python 3.x+, where dictionaries are guaranteed to preserve insertion order (you could be more explicit by using collections.OrderedDict
, but that's one more import). Finally, you shouldn't have to hardcode the "user"
and "IndexUsed"
key-literals. Someone please suggest a better answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With