Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove small components from a graph

I'm new to networkx and could do with some help please.

I have a set of data which I've processed to generate the nodes and edges. There are around 5000 groups of nodes that have more than 2 links within them (up to 10 nodes in the group in total). But the problem is that there are also several thousand pairs of nodes that only have 1 edge between them, i.e node a is linked to node b but neither are linked to any other node.

I want to remove these paired nodes from the chart.

Is there a way to filter these out?

like image 419
A Rob4 Avatar asked Jul 11 '16 13:07

A Rob4


1 Answers

So our goal is to remove all nodes from components with less than 3 nodes (this includes isolated nodes if they exist).

for component in list(nx.connected_components(G)):
    if len(component)<3:
        for node in component:
            G.remove_node(node)

A small warning is in order when using nx.connected_components. It returns a generator of components. If I didn't put list around it, it would generate one at a time, and then perform the steps for the given component. Once all that is done, it would generate the next component. But because G has been modified, python can't be sure that this behaves well. So it would die (complaining that a dictionary changed size --- the number of nodes in G changed). By turning it into a list, the components are all found before it starts the loop. So the graph won't be changing while the components are being found.

like image 141
Joel Avatar answered Oct 15 '22 05:10

Joel