Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an efficient way to cluster a graph according to Jaccard similarity?

Is there an efficient way to cluster nodes in a graph using Jaccard similarity such that each cluster has at least K nodes?

Jaccard similarity between nodes i and j:
Let S be the set of neighbours of i and T be the set of neighbours of j. Then the similarity between i and j is given by |(S ⋂ T)| / |(S ⋃ T)|.

like image 323
HHH Avatar asked Dec 20 '13 21:12

HHH


1 Answers

Have you tried implementing some algorithm yourself?

Compute all pairwise non-zero similarities (i.e. when they have at least one neighbor in common; this makes the candidate set much smaller than a squared matrix).

Sort them by similarity, and process pairs in decreasing similarity. Initially, each object is their own cluster.

When A and B are not yet in the same cluster, and either cluster has less than k members, join the two clusters. Repeat until all similarities have been processed.

Note that you may still end up having clusters with less than k members. For example, if your data set has less than k nodes total, or there are small subgraphs that are not connected etc.

You really should accept clusters of less than k nodes, i.e. unclustered nodes. Why would everything cluster? There always will be outliers and noise in real data.

like image 129
Has QUIT--Anony-Mousse Avatar answered Dec 13 '22 03:12

Has QUIT--Anony-Mousse