Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing vectorized code for graph adjacency

I am writing some Machine Learning code in MATLAB, and I am representing a graph with an adjacency matrix A, and a clustering of the graph with a matrix Z defined in the following ways.

A: a_ij is 1 if there is an edge between node i, and node j. 0 otherwise. Z: z_ij is 1 if node j is in cluster i. 0 otherwise.

I am computing a matrix N, which is the number of edges between clusters, defined in the following way:

N: n_ij is the number of edges between nodes in cluster i and nodes in cluster j. n_ii is the number of edges inside cluster i.

N can be computed by:

N = zAz'

where z' is z-transposed.

If I have a lot of nodes, then computing this takes some time, but that is not the problem. The problem is, that I move the nodes from cluster to cluster a lot of times, and each time I want to compute N.

So the problem is the following: Given that I know N, and that I then move node i from cluster c_1 to cluster c_2, how can I update N in an efficient way?

like image 667
utdiscant Avatar asked Mar 03 '12 20:03

utdiscant


1 Answers

To go from Z to Z + U, update

N ← N + Z(AU') + (UA)Z' + UAU'
Z ← Z + U.

Z and U (and A, if it makes sense for your graph) should have sparse representations. In theory, at least, this does more or less in compiled code what I would do in C: scan the neighbors of i, decrementing the edge counts to and from i's old cluster and incrementing the edge counts to and from i's new cluster. In practice, you may need to transpose Z to get it to align with properly Matlab's sparse matrix representation and execute the update Z ← Z + U by replacing two whole rows so that the newly zeroed entries are not treated as nonzero.

like image 173
foobar Avatar answered Oct 10 '22 06:10

foobar