I have a table:
x y z
A 2 0 3
B 0 3 0
C 0 0 4
D 1 4 0
I want to calculate the Jaccard similarity in Matlab, between the vectors A, B, C and D. The formula is :
In this formula |x| and |y| indicates the number of items which are not zero. For example |A| number of items that is not zero is 2, for |B| and |C| it is 1, and for |D| it is 2.
|x intersect y| indicates the number of common items which are not zero. |A intersect B| is 0. |A intersect D| is 1, because the value of x in both is not zero.
e.g.: jaccard(A,D)= 1/3=0.33
How can I implement this in Matlab?
The Jaccard similarity is calculated by dividing the number of observations in both sets by the number of observations in either set. In other words, the Jaccard similarity can be computed as the size of the intersection divided by the size of the union of two sets.
Jaccard distance is commonly used to calculate an n × n matrix for clustering and multidimensional scaling of n sample sets. This distance is a metric on the collection of all finite sets.
Typically, the Jaccard similarity coefficient (or index) is used to compare the similarity between two sets. For two sets, A and B , the Jaccard index is defined to be the ratio of the size of their intersection and the size of their union: J(A,B) = (A ∩ B) / (A ∪ B)
Matlab has a built-in function that computes the Jaccard distance: pdist
.
Here is some code
X = rand(2,100);
X(X>0.5) = 1;
X(X<=0.5) = 0;
JD = pdist(X,'jaccard') % jaccard distance
JI = 1 - JD; % jaccard index
EDIT
A calculation that does not require the statistic toolbox
a = X(1,:);
b = X(2,:);
JD = 1 - sum(a & b)/sum(a | b)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With