Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I remove a duplicate dict in list, ignoring a dict key?

I have a list of dictionaries. Each dictionary has several key-values, and a single arbitrary (but important) key-value pair. For example

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]

I would like to remove the duplicate dictionaries such that only the non- "ignore-key" values are ignored. I have seen a related question on so - but it only considers entirely identical dicts. Is there a way to remove the almost duplicate such that the data above becomes

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"}
]

It doesn't matter which of the duplicates is ignored. How can I do this?

like image 538
user4467853 Avatar asked Feb 10 '23 02:02

user4467853


2 Answers

Keep a set of the seen values for key and remove any dict that has the the same value:

st = set()

for d in thelist[:]:
    vals = d["key"],d["k2"]
    if vals in st:
        thelist.remove(d)
    st.add(vals)
print(thelist)

[{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'},
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]

If the values are always grouped, you can use the value from key to group and get the first dict from each group:

from itertools import groupby
from operator import itemgetter
thelist[:] = [next(v) for _, v in groupby(thelist,itemgetter("key","k2"))]
print(thelist)]

print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'}, 
{'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]

Or using a generator similar to DSM's answer to modify the original list without copying:

def filt(l):
    st = set()
    for d in l:
        vals = d["key"],d["k2"]
        if vals not in st:
            yield d
        st.add(vals)


thelist[:] = filt(thelist)

print(thelist)

 [{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'}, 
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]

If you don't care which dupe is removes just use reversed:

st = set()

for d in reversed(thelist):
    vals = d["key"],d["k2"]
    if vals in st:
        thelist.remove(d)
    st.add(vals)
print(thelist)

To ignore all bar the ignore_key using groupby:

from itertools import groupby

thelist[:] = [next(v) for _, v in groupby(thelist, lambda d: 
                [val for k, val in d.items() if k != "ignore_key"])]
print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'},
 {'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]
like image 198
Padraic Cunningham Avatar answered Feb 15 '23 11:02

Padraic Cunningham


You could cram things into a line or two, but I think it's cleaner just to write a function:

def f(seq, ignore_keys):
    seen = set()
    for elem in seq:
        index = frozenset((k,v) for k,v in elem.items() if k not in ignore_keys)
        if index not in seen:
            yield elem
            seen.add(index)

which gives

>>> list(f(thelist, ["ignore_key"]))
[{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'}, 
 {'ignore_key': 'arb11', 'k2': 'va2', 'key': 'value2'}]

This assumes your values are hashable. (If they're not, the same code will work with seen = [] and seen.append(index), although it'll have bad performance for long lists.)

like image 32
DSM Avatar answered Feb 15 '23 11:02

DSM