Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Networkx: Find all minimal cuts consisting of only nodes from one set in a bipartite graph

In the networkx python package, is there a way to find all node cuts of minimal size consisting of only nodes from one set in a bipartite graph? For example, if the two sides of a bipartite graph are A and B, how might I go about finding all minimal node cuts consisting of nodes entirely from set B? The following code I have works but it's extremely slow:

def get_one_sided_cuts(G, A, B):
    #get all cuts that consist of nodes exclusively from B which disconnect
    #nodes from A
    one_sided_cuts = []
    seen = []

    l = list(combinations(A, 2))

    for x in l:
        s = x[0]
        t = x[1]

        cut = connectivity.minimum_st_node_cut(G, s, t)
        if set(cut).issubset(B) and (cut not in seen):
            one_sided_cuts.append(cut)
        seen.append(cut)

    #find minimum cut size
    cur_min = float("inf")
    for i in one_sided_cuts:
        if len(i) < cur_min:
            cur_min = len(i)

    one_sided_cuts = [x for x in one_sided_cuts if len(x) == cur_min]

    return one_sided_cuts

Note that this actually only checks if there is a minimal cut which, if removed, would disconnect two nodes in A only. If your solution does this (instead of finding a cut that will separate any two nodes) that's fine too. Any ideas on how to do this more efficiently?

like image 332
kjakeb Avatar asked Nov 06 '17 22:11

kjakeb


1 Answers

As stated in the comment, there are a couple of interpretations of “all node cuts of minimal size consisting of only nodes from one set in a bipartite graph”. It either means

  1. All node cuts of minimum size when restricting cuts to be in one set of the bipartite graph, or
  2. All node cuts in an unconstrained sense (consisting of nodes from A or B) that happen to completely lie in B.

From your code example you are interested in 2. According to the docs, there is a way to speed up this calculation, and from profile results it helps a bit. There are auxiliary structures built, per graph, to determine the minimum node cuts. Each node is replaced by 2 nodes, additional directed edges are added, etc. according to the Algorithm 9 in http://www.cse.msu.edu/~cse835/Papers/Graph_connectivity_revised.pdf We can reuse these structures instead of reconstructing them inside a tight loop:

Improvement for Case 2:

from networkx.algorithms.connectivity import (
    build_auxiliary_node_connectivity)
from networkx.algorithms.flow import build_residual_network

from networkx.algorithms.flow import edmonds_karp

def getone_sided_cuts_Case2(G, A, B):
    # build auxiliary networks
    H = build_auxiliary_node_connectivity(G)
    R = build_residual_network(H, 'capacity')


    # get all cutes that consist of nodes exclusively from B which disconnet
    # nodes from A
    one_sided_cuts = []
    seen           = []

    l = list(combinations(A,2))

    for x in l:
        s = x[0]
        t = x[1]

    cut = minimum_st_node_cut(G, s, t, auxiliary=H, residual=R)
    if set(cut).issubset(B):
        if cut not in seen:
            one_sided_cuts.append(cut)
    seen.append(cut)

    # Find minimum cut size
    cur_min = float('inf')
    for i in one_sided_cuts:
        if len(i) < cur_min:
            curr_min = len(i)

    one_sided_cuts = [x for x in one_sided_cuts if len(x) == cur_min]

    return one_sided_cuts

For profiling purposes, you might use the following, or one of the built-in bipartite graph generators in Networkx:

def create_bipartite_graph(size_m, size_n, num_edges):
    G = nx.Graph()

    edge_list_0 = list(range(size_m))
    edge_list_1 = list(range(size_m,size_m+size_n))
    all_edges = []

    G.add_nodes_from(edge_list_0, bipartite=0)
    G.add_nodes_from(edge_list_1, bipartite=1)

    all_edges = list(product(edge_list_0, edge_list_1))
    num_all_edges = len(all_edges)

    edges = [all_edges[i] for i in random.sample(range(num_all_edges), num_edges)]
    G.add_edges_from(edges)

    return G, edge_list_0, edge_list_1

Using %timeit, the second version runs about 5-10% faster.

For Case 1, the logic is a little more involved. We need to consider minimal cuts from nodes only inside B. This requires a change to minimum_st_node_cut in the following way. Then replace all occurences of minimum_st_node_cut to rest_minimum_st_node_cut in your solution or the Case 2 solution I gave above, noting that the new function also requires specification of the sets A, B, necessarily:

def rest_build_auxiliary_node_connectivity(G,A,B):
    directed = G.is_directed()

    H = nx.DiGraph()

    for node in A:
        H.add_node('%sA' % node, id=node)
        H.add_node('%sB' % node, id=node)
        H.add_edge('%sA' % node, '%sB' % node, capacity=1)

    for node in B:
        H.add_node('%sA' % node, id=node)
        H.add_node('%sB' % node, id=node)
        H.add_edge('%sA' % node, '%sB' % node, capacity=1)        

    edges = []
    for (source, target) in G.edges():
        edges.append(('%sB' % source, '%sA' % target))
        if not directed:
            edges.append(('%sB' % target, '%sA' % source))
    H.add_edges_from(edges, capacity=1)

    return H

def rest_minimum_st_node_cut(G, A, B, s, t, auxiliary=None, residual=None, flow_func=edmonds_karp):

    if auxiliary is None:
        H = rest_build_auxiliary_node_connectivity(G, A, B)
    else:
        H = auxiliary

    if G.has_edge(s,t) or G.has_edge(t,s):
        return []
    kwargs = dict(flow_func=flow_func, residual=residual, auxiliary=H)

    for node in [x for x in A if x not in [s,t]]:
        edge = ('%sA' % node, '%sB' % node)
        num_in_edges = len(H.in_edges(edge[0]))
        H[edge[0]][edge[1]]['capacity'] = num_in_edges

    edge_cut = minimum_st_edge_cut(H, '%sB' % s, '%sA' % t,**kwargs)

    node_cut = set([n for n in [H.nodes[node]['id'] for edge in edge_cut for node in edge] if n not in A])

    return node_cut - set([s,t])

We then have, for example:

In [1]: G = nx.Graph()
        # A = [0,1,2,3], B = [4,5,6,7]
In [2]: G.add_edges_from([(0,4),(0,5),(1,6),(1,7),(4,1),(5,1),(6,3),(7,3)])
In [3]: minimum_st_node_cut(G, 0, 3)
           {1}
In [4]: rest_minimum_st_node_cut(G,A,B,0,3)
           {6, 7}

Finally note that the minimum_st_edge_cut() function returns [] if two nodes are adjacent. Sometimes the convention is to return a set of n-1 nodes in this case, all nodes except the source or sink. Anyway, with the empty list convention, and since your original solution to Case 2 loops over node pairs in A, you will likely get [] as a return value for most configurations, unless no nodes in A are adjacent, say.

EDIT

The OP encountered a problem with bipartite graphs for which the sets A, B contained a mix of integers and str types. It looks to me like the build_auxiliary_node_connectivity converts those str nodes to integers causing collisions. I rewrote things above, I think that takes care of it. I don't see anything in the networkx docs about this, so either use all integer nodes or use the rest_build_auxiliary_node_connectivity() thing above.

like image 104
Charles Pehlivanian Avatar answered Oct 20 '22 22:10

Charles Pehlivanian