Resolving items in list that also are grouped

Question

I have a dict that looks like this:

{
    "group-1": ["a.b", "c.d", "group-2"],
    "group-2": ["e.f", "c.d"],
    "group-3": ["group-1", "group-2"],
}

The dict is big, but fits fine in memory (a couple of thousand items).

I am trying to resolve the groups, so that each group got the list of all it's members.

So in this case, the "solution"-dict would be:

{
    "group-1": ["a.b", "c.d", "e.f"],
    "group-2": ["e.f", "c.d"],
    "group-3": ["a.b", "c.d", "e.f"],
}

because each group

have a list of all it's members
have resolved the group-in-group problem
Groups does not contain ., but items always does..

I'm not sure how to go about this, without being incredible inefficient.

Something like this is what I got so far, doesnt work, and is probably the wrong direction:

from pprint import pprint
from collections import defaultdict

def normalize(data):
    group_map = defaultdict(set)

    found_some = True
    while found_some:
        found_some = False
        for k, v in data.items():
            for i in v:
                if "." in i:
                    if i not in group_map[k]:
                        group_map[k].add(i)
                        found_some = True
                else:
                    ....

    return group_map

tobias_k · Accepted Answer

You could try a recursive function to keep resolving elements:

def resolve(d, key):
    for x in d[key]:
        if x in d:
            yield from resolve(d, x)
        else:
            yield x

Or in a single line:

def resolve(d, key):
    return (y for x in d[key] for y in (resolve(d, x) if x in d else [x]))

Applied to your example:

d = {
    "group-1": ["a.b", "c.d", "group-2"],
    "group-2": ["e.f", "c.d"],
    "group-3": ["group-1", "group-2"],
}
r = {k: sorted(set(resolve(d, k))) for k in d}
# {'group-1': ['a.b', 'c.d', 'e.f'],
#  'group-2': ['c.d', 'e.f'],
#  'group-3': ['a.b', 'c.d', 'e.f']}

Note that if your dict is very big, you should probably add the @functools.lru_cache(None) decorator to add memoization to the function. In this case, you will have to remove the non-hashable d parameter (and use the d from the surrounding score instead). Depending on the "depth" of the references you might also have to increase the recursion limit. And of course, this will not work if there are cyclic dependencies (but I think the same will be true for any other appraoch).

Resolving items in list that also are grouped

Tags:

python

algorithm

python-3.x

xeor

1 Answers

tobias_k

Recent Activity

Donate For Us

Resolving items in list that also are grouped

Tags:

python

algorithm

python-3.x

xeor

1 Answers

tobias_k

Related questions

Recent Activity

Donate For Us