I have a list of dictionaries. Each dictionary has several key-values, and a single arbitrary (but important) key-value pair. For example
thelist = [
{"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"},
{"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
{"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]
I would like to remove the duplicate dictionaries such that only the non- "ignore-key" values are ignored. I have seen a related question on so - but it only considers entirely identical dicts. Is there a way to remove the almost duplicate such that the data above becomes
thelist = [
{"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"},
{"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"}
]
It doesn't matter which of the duplicates is ignored. How can I do this?
Keep a set of the seen values for key
and remove any dict that has the the same value:
st = set()
for d in thelist[:]:
vals = d["key"],d["k2"]
if vals in st:
thelist.remove(d)
st.add(vals)
print(thelist)
[{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'},
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]
If the values are always grouped, you can use the value
from key
to group and get the first dict from each group:
from itertools import groupby
from operator import itemgetter
thelist[:] = [next(v) for _, v in groupby(thelist,itemgetter("key","k2"))]
print(thelist)]
print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'},
{'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]
Or using a generator similar to DSM's answer to modify the original list without copying:
def filt(l):
st = set()
for d in l:
vals = d["key"],d["k2"]
if vals not in st:
yield d
st.add(vals)
thelist[:] = filt(thelist)
print(thelist)
[{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'},
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]
If you don't care which dupe is removes just use reversed:
st = set()
for d in reversed(thelist):
vals = d["key"],d["k2"]
if vals in st:
thelist.remove(d)
st.add(vals)
print(thelist)
To ignore all bar the ignore_key using groupby:
from itertools import groupby
thelist[:] = [next(v) for _, v in groupby(thelist, lambda d:
[val for k, val in d.items() if k != "ignore_key"])]
print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'},
{'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]
You could cram things into a line or two, but I think it's cleaner just to write a function:
def f(seq, ignore_keys):
seen = set()
for elem in seq:
index = frozenset((k,v) for k,v in elem.items() if k not in ignore_keys)
if index not in seen:
yield elem
seen.add(index)
which gives
>>> list(f(thelist, ["ignore_key"]))
[{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'},
{'ignore_key': 'arb11', 'k2': 'va2', 'key': 'value2'}]
This assumes your values are hashable. (If they're not, the same code will work with seen = []
and seen.append(index)
, although it'll have bad performance for long lists.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With