Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain a set of dictionaries?

I have a list of dictionaries, but some of them are duplicates and I want to remove them (the duplicates).

The keys of the dict are a sequential number.

An example is the following:

[{1: {a:[1,2,3], b: 4}},
{2: {a:[4,5,6], d: 5}},
{3: {a:[1,2,3], b: 4}},
.....,
{1000: {a:[2,5,1], b: 99}},
]

Considering the previous example I would like to obtain:

[{1: {a:[1,2,3], b: 4}},
{2: {a:[4,5,6], d: 5}},
.....,
{1000: {a:[2,5,1], b: 99}},
]

In fact the dictionaries with keys 1 and 3 are identically in their values.

I tried with a set, but since dict is a not hashable type I am not able to do so.

How can i fix the problem?

EDIT

In my case the number of items inside the dict is not fix, so I can have:

[{1: {a:[1,2,3], b: 4}},
{2: {a:[4,5,6], d: 5}},
.....,
{1000: {a:[2,5,1], b: 99, c:["a","v"]}},
]

where the dict with keys 100 has three elements inside insted of two as the other shown.

like image 615
A1010 Avatar asked Jan 02 '23 00:01

A1010


2 Answers

To get around the limitation of @jdehesa's solution, where [1, 2] would be treated as a duplicate as (1, 2), you can preserve the data types by using pprint.pformat instead to serialize the data structure. Since pprint.pformat sorts dicts by keys and sets by items, {1: 2, 3: 4} would be properly considered the same as {3: 4, 1: 2}, but [1, 2] would not be considered a duplicate to (1, 2):

from pprint import pformat
lst = [
    {1: {'a': [1, 2, 3], 'b': 4}},
    {2: {'a': [4, 5, 6], 'd': 5}},
    {3: {'b': 4, 'a': [1, 2, 3]}},
    {4: {'a': (4, 5, 6), 'd': 5}},
]
seen = set()
output = []
for d in lst:
    for k, v in d.items():
        signature = pformat(v)
        if signature not in seen:
            seen.add(signature)
            output.append({k: v})

output becomes:

[{1: {'a': [1, 2, 3], 'b': 4}},
 {2: {'a': [4, 5, 6], 'd': 5}},
 {4: {'a': (4, 5, 6), 'd': 5}}]
like image 132
blhsing Avatar answered Jan 03 '23 14:01

blhsing


You can maybe use a function like this to turn your objects into something hashable:

def make_hashable(o):
    if isinstance(o, dict):
        return frozenset((k, make_hashable(v)) for k, v in o.items())
    elif isinstance(o, list):
        return tuple(make_hashable(elem) for elem in o)
    elif isinstance(o, set):
        return frozenset(make_hashable(elem) for elem in o)
    else:
        return o

Then you keep a set of seen objects and keep only the keys of each dictionary containing objects that you did not see before:

lst = [
    {1: {'a':[1,2,3], 'b': 4}},
    {2: {'a':[4,5,6], 'd': 5}},
    {3: {'a':[1,2,3], 'b': 4}},
]

seen = set()
result_keys = []
for elem in lst:
    keep_keys = []
    for k, v in elem.items():
        v_hashable = make_hashable(v)
        if v_hashable not in seen:
            seen.add(v_hashable)
            keep_keys.append(k)
    result_keys.append(keep_keys)
result = [{k: elem[k] for k in keys} for elem, keys in zip(lst, result_keys) if keys]
print(result)
# [{1: {'a': [1, 2, 3], 'b': 4}}, {2: {'a': [4, 5, 6], 'd': 5}}]

Note that, as blhsing notes in the comments, this has some limitations, such as considering (1, 2) and [1, 2] equals, as well as {1: 2} and {(1, 2)}. Also, some types may not be convertible to an equivalent hashable type.

EDIT: As a_guest suggests, you can work around the type ambiguity by returning the type itself along with the hashable object in make_hashable:

def make_hashable(o):
    t = type(o)
    if isinstance(o, dict):
        o = frozenset((k, make_hashable(v)) for k, v in o.items())
    elif isinstance(o, list):
        o = tuple(make_hashable(elem) for elem in o)
    elif isinstance(o, set):
        o = frozenset(make_hashable(elem) for elem in o)
    return t, o

If you don't need to look into the hashable object, this will easily provide strict type comparison. Note in this case even things like {1, 2} and frozenset({1, 2}) will be different.

like image 32
jdehesa Avatar answered Jan 03 '23 13:01

jdehesa