Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Graph modularity in python networkx

I have created a graph in python lib NetorwkX and I want to implement a modularity algorithm in order to cluster the nodes of my graph. I came across the following code:

import community
import matplotlib.pyplot as plt
import networkx as nx

G = nx.Graph()

G = nx.read_weighted_edgelist('graphs/fashionGraph_1.edgelist')
nx.transitivity(G)

# Find modularity
part = community.best_partition(G)
mod = community.modularity(part,G)

# Plot, color nodes using community structure
values = [part.get(node) for node in G.nodes()]
nx.draw_spring(G, cmap=plt.get_cmap('jet'), node_color = values, node_size=30, with_labels=False)
plt.show()

My graph has 4267 and 3692 edges. The resulting plot is this:

enter image description here

I am a bit confused on how the nodes of the graph is clustered. Which are exactly the logic of the colors?

like image 809
snake plissken Avatar asked Apr 27 '15 13:04

snake plissken


2 Answers

From the documentation:

Node color. Can be a single color format string, or a sequence of colors with the same length as nodelist. If numeric values are specified they will be mapped to colors using the cmap and vmin,vmax parameters. See matplotlib.scatter for more details.

part = community.best_partition(G) assigns a community to each node - part is a dict, and part[node] is the community the node belongs to (each is assigned an integer). Later values = [part.get(node) for node in G.nodes()] creates a list with the community number for each node in the order the nodes appear in G.nodes().

Then in the plotting command, it will use those community numbers to determine the color of the nodes. All nodes which have been assigned to the same community will be given the same color.

The physical locations of the nodes are assigned by the spring layout. You can see that the spring layout seems to be putting the nodes into positions that suggest some communities that are different from what community.best_partition finds. This is perhaps mildly surprising, but certainly nothing prevents it. It does make me think that the algorithm you've used doesn't appropriately account for all of the structure in the network. The documentation for best_partition gives some explanation of the underlying algorithm.

like image 125
Joel Avatar answered Sep 27 '22 22:09

Joel


Roughly speaking, nodes are grouped into communities, such that the ratio of intra community connections to inter communities connections (modularity measure) is optimized.

A precise definition of the modularity from wikipedia:

Modularity is the fraction of the edges that fall within the given groups minus the expected such fraction if edges were distributed at random. The value of the modularity lies in the range [−1/2,1). It is positive if the number of edges within groups exceeds the number expected on the basis of chance. For a given division of the network's vertices into some modules, modularity reflects the concentration of edges within modules compared with random distribution of links between all nodes regardless of modules.

The algorithm implemented by the community package finds approximated solution (separation to communities) using iterative process which at the beginning define each node as a community and keeps merging them till modularity is optimized.

More accurate info can be found in the paper describing the algorithm:

Fast unfolding of communities in large networks. VD Blondel, JL Guillaume, R Lambiotte, E Lefebvre Journal of statistical mechanics: theory and experiment 2008 (10), P10008

(I was able to retrieve and install it on windows from https://pypi.python.org/pypi/python-louvain)

like image 35
schiffli Avatar answered Sep 27 '22 21:09

schiffli