Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jaccard Distance

I have this problem in calculating Jaccard Distance for Sets (Bit-Vectors):

p1 = 10111;

p2 = 10011.

Size of intersection = 3; (How could we find it out?)

Size of union = 4, (How could we find it out?)

Jaccard similarity = (intersection/union) = 3/4.

Jaccard Distance = 1 – (Jaccard similarity) = (1-3/4) = 1/4.

But I don't understand how could we find out the "intersection" and "union" of the two vectors.

Please help me.

Thanks alot.

like image 260
Visitor Avatar asked Jan 20 '23 23:01

Visitor


2 Answers

Size of intersection = 3; (How could we find it out?)

Amount of set bits of p1&p2 = 10011

Size of union = 4, (How could we find it out?)

Amount of set bits of p1|p2 = 10111

Vector here means binary array where i-th bit means does i-th element present in this set.

like image 108
Andrey Avatar answered Jan 23 '23 13:01

Andrey


If p1 = 10111 and p2 = 10011,

The total number of each combination attributes for p1 and p2:

  • M11 = total number of attributes where p1 & p2 have a value 1,
  • M01 = total number of attributes where p1 has a value 0 & p2 has a value 1,
  • M10 = total number of attributes where p1 has a value 1 & p2 has a value 0,
  • M00 = total number of attributes where p1 & p2 have a value 0.

Jaccard similarity coefficient = J = intersection/union = M11/(M01 + M10 + M11) = 3 / (0 + 1 + 3) = 3/4,

Jaccard distance = J' = 1 - J = 1 - 3/4 = 1/4, Or J' = 1 - (M11/(M01 + M10 + M11)) = (M01 + M10)/(M01 + M10 + M11) = (0 + 1)/(0 + 1 + 3) = 1/4

like image 40
handaru Avatar answered Jan 23 '23 11:01

handaru