I'm not sure if I am asking the question in the right way, but this is my issue: I have a list of dicts in the following format: <pre class="prettyprint"><code>[ {'user': 'joe', 'IndexUsed': 'a'}, {'user': 'joe', 'IndexUsed': 'a'}, {'user': 'joe', 'IndexUsed': 'a'}, {'user': 'joe', 'IndexUsed': 'b'}, {'user': 'admin', 'IndexUsed': 'a'}, {'user': 'admin', 'IndexUsed': 'c'}, {'user': 'hugo', 'IndexUsed': 'a'}, {'user': 'hugo', 'IndexUsed': 'd'}, ... ] </code></pre> I want my final result to look like this: <pre class="prettyprint"><code>[ {'user': 'joe', 'IndexUsed': ['a', 'b']}, {'user': 'admin', 'IndexUsed': ['a', 'c']}, {'user': 'hugo', 'IndexUsed': ['a', 'd']}, ] </code></pre> In essence, combining/deduplicating the unique fields in <code>IndexUsed</code> and reducing them to only one dict per <code>user</code> I have looked into using reducers, dict comprehension, and searched on StackOverflow but I have some trouble finding use cases using strings. The majority of examples I have found are using integers to combine them into a final int/float, but here I rather want to combine it into a single final string. Could you help me understand how to approach this problem?

<pre class="prettyprint"><code>from collections import defaultdict data = [{'IndexUsed': 'a', 'user': 'joe'}, {'IndexUsed': 'a', 'user': 'joe'}, {'IndexUsed': 'a', 'user': 'joe'}, {'IndexUsed': 'b', 'user': 'joe'}, {'IndexUsed': 'a', 'user': 'admin'}, {'IndexUsed': 'c', 'user': 'admin'}, {'IndexUsed': 'a', 'user': 'hugo'}, {'IndexUsed': 'd', 'user': 'hugo'}] indexes_used = defaultdict(set) for d in data: indexes_used[d['user']].add(d['IndexUsed']) result = [] for k, v in indexes_used.items(): result.append({'user': k, 'IndexUsed': sorted(list(v))}) print(*result) </code></pre> Outputs: <pre class="prettyprint"><code>{'user': 'joe', 'IndexUsed': ['a', 'b']} {'user': 'admin', 'IndexUsed': ['a', 'c']} {'user': 'hugo', 'IndexUsed': ['a', 'd']} </code></pre> Note: for the unaware, <code>defaultdict</code> uses the passed function (<code>set</code> in this case) as a factory to create the new missing key corresponding value. So every single key of <code>indexes_used</code> is set to a <code>set</code> filled with the used indexes. Using a <code>set</code> also ignores duplicates. In the end the <code>set</code> is converted to a sorted list, while creating the required key <code>IndexUsed</code>.

Python: Combining unique values in list of dicts where keys are the same?

Tags:

python

list

python-3.x

dictionary-comprehension

I'm not sure if I am asking the question in the right way, but this is my issue:

I have a list of dicts in the following format:

[
{'user': 'joe', 'IndexUsed': 'a'}, 
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'b'}, 
{'user': 'admin', 'IndexUsed': 'a'}, 
{'user': 'admin', 'IndexUsed': 'c'},
{'user': 'hugo', 'IndexUsed': 'a'},
{'user': 'hugo', 'IndexUsed': 'd'},
...
]

I want my final result to look like this:

[
{'user': 'joe', 'IndexUsed': ['a', 'b']}, 
{'user': 'admin', 'IndexUsed': ['a', 'c']}, 
{'user': 'hugo', 'IndexUsed': ['a', 'd']},
]

In essence, combining/deduplicating the unique fields in IndexUsed and reducing them to only one dict per user

I have looked into using reducers, dict comprehension, and searched on StackOverflow but I have some trouble finding use cases using strings. The majority of examples I have found are using integers to combine them into a final int/float, but here I rather want to combine it into a single final string. Could you help me understand how to approach this problem?

996

asked Jan 12 '21 14:01

SynchronDEV

2 Answers

from collections import defaultdict


data = [{'IndexUsed': 'a', 'user': 'joe'},
 {'IndexUsed': 'a', 'user': 'joe'},
 {'IndexUsed': 'a', 'user': 'joe'},
 {'IndexUsed': 'b', 'user': 'joe'},
 {'IndexUsed': 'a', 'user': 'admin'},
 {'IndexUsed': 'c', 'user': 'admin'},
 {'IndexUsed': 'a', 'user': 'hugo'},
 {'IndexUsed': 'd', 'user': 'hugo'}]

indexes_used = defaultdict(set)
for d in data:
    indexes_used[d['user']].add(d['IndexUsed'])

result = []
for k, v in indexes_used.items():
    result.append({'user': k, 'IndexUsed': sorted(list(v))})

print(*result)

Outputs:

{'user': 'joe', 'IndexUsed': ['a', 'b']} {'user': 'admin', 'IndexUsed': ['a', 'c']} {'user': 'hugo', 'IndexUsed': ['a', 'd']}

Note: for the unaware, defaultdict uses the passed function (set in this case) as a factory to create the new missing key corresponding value. So every single key of indexes_used is set to a set filled with the used indexes. Using a set also ignores duplicates. In the end the set is converted to a sorted list, while creating the required key IndexUsed.

answered Oct 26 '22 23:10

progmatico

If the dictionaries are guaranteed to be grouped together by name, then you could use itertools.groupby to process each group of dictionaries separately:

from itertools import groupby
from operator import itemgetter

data = [
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'b'},
    {'user': 'admin', 'IndexUsed': 'a'},
    {'user': 'admin', 'IndexUsed': 'c'},
    {'user': 'hugo', 'IndexUsed': 'a'},
    {'user': 'hugo', 'IndexUsed': 'd'},
]

merged_data = [{"user": key, "IndexUsed": list({i: None for i in map(itemgetter("IndexUsed"), group)})} for key, group in groupby(data, key=itemgetter("user"))]
for d in merged_data:
    print(d)

Output:

{'user': 'joe', 'IndexUsed': ['a', 'b']}
{'user': 'admin', 'IndexUsed': ['a', 'c']}
{'user': 'hugo', 'IndexUsed': ['a', 'd']}
>>>

This was just the first thing I came up with, but I don't like it for several reasons. First, like I said, it assumes that the original dictionaries are grouped together by the key user. In addition, long list-comprehensions are not readable and should be avoided. The way in which the merged IndexUsed list is generated is by creating a temporary dictionary which maps unique entries to None (ew, gross - a dictionary is used rather than a set, because sets don't preserve insertion order). It also assumes you're using a certain version of Python 3.x+, where dictionaries are guaranteed to preserve insertion order (you could be more explicit by using collections.OrderedDict, but that's one more import). Finally, you shouldn't have to hardcode the "user" and "IndexUsed" key-literals. Someone please suggest a better answer.

answered Oct 27 '22 00:10

Paul M.

Related questions
                            
                                Can't fetch some numbers from a website using requests
                            
                                Tensorflow 2.3.0 does not detect GPU
                            
                                keras accuracy doesn't improve more than 59 percent
                            
                                Plotly: How to create an odd number of subplots?
                            
                                arrays into pandas dataframe columns
                            
                                What is the proper way to specify a custom template path for jupyter nbconvert V6?
                            
                                Extracting blocks from block diagonal PyTorch tensor
                            
                                How can I prevent or trap StopIteration exception in the yield-calling function?
                            
                                How can I remove numbers, and words with length below 2, from a sentence?
                            
                                Set description for query parameter in swagger doc using Pydantic model (FastAPI)
                            
                                Will run_in_executor ever block?
                            
                                typing: How to bind owner class to generic descriptor?
                            
                                `pip install` with all extras
                            
                                package_dir in setup.py not working as expected
                            
                                Can't install Matplotlib on macOS Big Sur
                            
                                Including another file in Dataflow Python flex template, ImportError
                            
                                How to debug a python script launched by a third party app
                            
                                How to define a dataclass so each of its attributes is the list of its subclass attributes?
                            
                                Change How Pandas Displays nan
                            
                                Generating binary sequences without repetition

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With