I have a list of dictionaries each of them describing a file (file format, filename, filesize, ... and a full path to the file [always unique]). The goal is to exclude all but one dictionaries describing copies of the same file (I just want a single dict (entry) per file, no matter how many copies there are.
In other words: if 2 (or more) dicts differ only in a single key (i.e. path) - leave only one of them).
For example, here is the source list:
src_list = [{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/'},
{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/mydir'},
{'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/'},
{'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/mydir2'}]
The result should look like this:
dst_list = [{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/'},
{'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/mydir2'}]
Use another dictionary to map the dictionaries from the list without the "ignored" keys to the actual dictionaries. This way, only one of each kind will be retained. Of course, dicts are not hashable, so you have to use (sorted) tuples instead.
src_list = [{'filename': 'abc', 'filetype': '.txt', 'path': 'C:/'},
{'filename': 'abc', 'filetype': '.txt', 'path': 'C:/mydir'},
{'filename': 'def', 'filetype': '.zip', 'path': 'C:/'},
{'filename': 'def', 'filetype': '.zip', 'path': 'C:/mydir2'}]
ignored_keys = ["path"]
filtered = {tuple((k, d[k]) for k in sorted(d) if k not in ignored_keys): d for d in src_list}
dst_lst = list(filtered.values())
Result is:
[{'path': 'C:/mydir', 'filetype': '.txt', 'filename': 'abc'},
{'path': 'C:/mydir2', 'filetype': '.zip', 'filename': 'def'}]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With