Is there an efficient way to cluster a graph according to Jaccard similarity?

Question

Is there an efficient way to cluster nodes in a graph using Jaccard similarity such that each cluster has at least K nodes?

Jaccard similarity between nodes i and j:
Let S be the set of neighbours of i and T be the set of neighbours of j. Then the similarity between i and j is given by |(S ⋂ T)| / |(S ⋃ T)|.

Has QUIT--Anony-Mousse · Accepted Answer

Have you tried implementing some algorithm yourself?

Compute all pairwise non-zero similarities (i.e. when they have at least one neighbor in common; this makes the candidate set much smaller than a squared matrix).

Sort them by similarity, and process pairs in decreasing similarity. Initially, each object is their own cluster.

When A and B are not yet in the same cluster, and either cluster has less than k members, join the two clusters. Repeat until all similarities have been processed.

Note that you may still end up having clusters with less than k members. For example, if your data set has less than k nodes total, or there are small subgraphs that are not connected etc.

You really should accept clusters of less than k nodes, i.e. unclustered nodes. Why would everything cluster? There always will be outliers and noise in real data.

Is there an efficient way to cluster a graph according to Jaccard similarity?

Tags:

algorithm

cluster-analysis

graph-algorithm

hierarchical-clustering

HHH

1 Answers

Has QUIT--Anony-Mousse

Recent Activity

Donate For Us

Is there an efficient way to cluster a graph according to Jaccard similarity?

Tags:

algorithm

cluster-analysis

graph-algorithm

hierarchical-clustering

HHH

1 Answers

Has QUIT--Anony-Mousse

Related questions

Recent Activity

Donate For Us