Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging overlapping (str) objects

Tags:

python

The problem is the following :

I want to go from having this set

{'A/B', 'B/C', 'C/D', 'D/E', ..., 'U/V', 'V/W', ..., 'X/Y', ..., 'Z', ...}

to this set

{'A/B/C/D/E', ..., 'U/V/W', ..., 'X/Y', ..., 'Z', ...}

where the objects A, B, C ... are just strings of characters. The output solution should be independent of the order in which the objects appears (i.e. if you scramble the objects in the set, the solution should always be the same)

In other words I want to merge overlapping objects.

Inputs of the following form cannot happen :

{"A/B", "B/C", "B/D"}
{"A/B", "B/C", "C/A"}

There can be objects with no '/' in them.

Here is a partial solution I've come up with :

    example={'A/B', 'B/C', 'C/D', 'D/E','U/V', 'V/W','X/Y'}
    
    def ext_3plus(unit):
        for couple in list(itertools.combinations(list(unit),2)):
            if '/' in couple[0] and '/' in couple[1]:
                if couple[0].split('/')[0]==couple[1].split('/')[1]:
                    unit.remove(couple[0])
                    unit.remove(couple[1])
                    unit.add(couple[1].split('/')[0]+'/'+couple[0])
                if couple[0].split('/')[1]==couple[1].split('/')[0]:
                    unit.remove(couple[0])
                    unit.remove(couple[1])
                    unit.add(couple[0]+'/'+couple[1].split('/')[1])
            else: #the input can contain object not having '/'
                continue

There is two problems, first it does only one iteration, the result on {'A/B', 'B/C', 'C/D', 'D/E','U/V', 'V/W','X/Y'}

is :

{'A/B/C', 'C/D/E', 'U/V/W', 'X/Y'}

Second, if I include objects containing no '/', the input being {'A/B', 'B/C', 'C/D', 'D/E','U/V', 'V/W','X/Y','Z'}, the result is different from the previous one :

{'A/B', 'B/C/D', 'D/E', 'U/V/W', 'X/Y', 'Z'}

So there should be a recursive call on the first iteration etc. How should it be done ?

like image 766
rawtoine Avatar asked Feb 26 '26 14:02

rawtoine


1 Answers

If I understood correctly this can be seen as a graph problem, and solve as such:

import networkx as nx

example = {'A/B', 'B/C', 'C/D', 'D/E', 'U/V', 'V/W', 'X/Y', "Z"}

# convert each string to a and edge
# each pattern to the side of / is a node
edges = [tuple(s.split("/")) for s in example if "/" in s]

nodes = [s for s in example if "/" not in s]

# create directed graph from edges
g = nx.from_edgelist(edges, create_using=nx.DiGraph)
g.add_nodes_from(nodes)

# find each path using topological sort
runs, current = [], []
for e in nx.topological_sort(g):
    # start a new path each time a node with in-degree 0
    # in-degree 0 means it is the start of a new path
    if g.in_degree(e) == 0:
        if current:
            runs.append(current)
            current = []
    current.append(e)

if current:
    runs.append(current)

# format the result
result = ["/".join(run) for run in runs]
print(result)

Output

['Z', 'U/V/W', 'X/Y', 'A/B/C/D/E']

If I'm not mistaken the overall complexity of this approach is O(n). More on topological sorting can be found here.

UPDATE

In networkx 2.6.4 use lexicographical_topological_sort

like image 129
Dani Mesejo Avatar answered Mar 01 '26 03:03

Dani Mesejo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!