Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resolving items in list that also are grouped

I have a dict that looks like this:

{
    "group-1": ["a.b", "c.d", "group-2"],
    "group-2": ["e.f", "c.d"],
    "group-3": ["group-1", "group-2"],
}

The dict is big, but fits fine in memory (a couple of thousand items).

I am trying to resolve the groups, so that each group got the list of all it's members.

So in this case, the "solution"-dict would be:

{
    "group-1": ["a.b", "c.d", "e.f"],
    "group-2": ["e.f", "c.d"],
    "group-3": ["a.b", "c.d", "e.f"],
}

because each group

  • have a list of all it's members
  • have resolved the group-in-group problem
  • Groups does not contain ., but items always does..

I'm not sure how to go about this, without being incredible inefficient.


Something like this is what I got so far, doesnt work, and is probably the wrong direction:

from pprint import pprint
from collections import defaultdict

def normalize(data):
    group_map = defaultdict(set)

    found_some = True
    while found_some:
        found_some = False
        for k, v in data.items():
            for i in v:
                if "." in i:
                    if i not in group_map[k]:
                        group_map[k].add(i)
                        found_some = True
                else:
                    ....

    return group_map
like image 440
xeor Avatar asked Mar 04 '23 07:03

xeor


1 Answers

You could try a recursive function to keep resolving elements:

def resolve(d, key):
    for x in d[key]:
        if x in d:
            yield from resolve(d, x)
        else:
            yield x

Or in a single line:

def resolve(d, key):
    return (y for x in d[key] for y in (resolve(d, x) if x in d else [x]))

Applied to your example:

d = {
    "group-1": ["a.b", "c.d", "group-2"],
    "group-2": ["e.f", "c.d"],
    "group-3": ["group-1", "group-2"],
}
r = {k: sorted(set(resolve(d, k))) for k in d}
# {'group-1': ['a.b', 'c.d', 'e.f'],
#  'group-2': ['c.d', 'e.f'],
#  'group-3': ['a.b', 'c.d', 'e.f']}

Note that if your dict is very big, you should probably add the @functools.lru_cache(None) decorator to add memoization to the function. In this case, you will have to remove the non-hashable d parameter (and use the d from the surrounding score instead). Depending on the "depth" of the references you might also have to increase the recursion limit. And of course, this will not work if there are cyclic dependencies (but I think the same will be true for any other appraoch).

like image 72
tobias_k Avatar answered Mar 16 '23 16:03

tobias_k