Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicates from the list of dictionaries (with a unique value)

I have a list of dictionaries each of them describing a file (file format, filename, filesize, ... and a full path to the file [always unique]). The goal is to exclude all but one dictionaries describing copies of the same file (I just want a single dict (entry) per file, no matter how many copies there are.

In other words: if 2 (or more) dicts differ only in a single key (i.e. path) - leave only one of them).

For example, here is the source list:

src_list = [{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/'},
            {'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/mydir'},
            {'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/'},
            {'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/mydir2'}]

The result should look like this:

dst_list = [{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/'},
            {'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/mydir2'}]
like image 445
Vasily Avatar asked Feb 07 '23 20:02

Vasily


1 Answers

Use another dictionary to map the dictionaries from the list without the "ignored" keys to the actual dictionaries. This way, only one of each kind will be retained. Of course, dicts are not hashable, so you have to use (sorted) tuples instead.

src_list = [{'filename': 'abc', 'filetype': '.txt', 'path': 'C:/'},
            {'filename': 'abc', 'filetype': '.txt', 'path': 'C:/mydir'},
            {'filename': 'def', 'filetype': '.zip', 'path': 'C:/'},
            {'filename': 'def', 'filetype': '.zip', 'path': 'C:/mydir2'}]
ignored_keys = ["path"]
filtered = {tuple((k, d[k]) for k in sorted(d) if k not in ignored_keys): d for d in src_list}
dst_lst = list(filtered.values())

Result is:

[{'path': 'C:/mydir', 'filetype': '.txt', 'filename': 'abc'}, 
 {'path': 'C:/mydir2', 'filetype': '.zip', 'filename': 'def'}]
like image 61
tobias_k Avatar answered Feb 09 '23 10:02

tobias_k