Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating the similarity of two lists

I have two lists:

eg. a = [1,8,3,9,4,9,3,8,1,2,3] and b = [1,8,1,3,9,4,9,3,8,1,2,3]

Both contain ints. There is no meaning behind the ints (eg. 1 is not 'closer' to 3 than it is to 8).

I'm trying to devise an algorithm to calculate the similarity between two ORDERED lists. Ordered is keyword right here (so I can't just take the set of both lists and calculate their set_difference percentage). Sometimes numbers do repeat (for example 3, 8, and 9 above, and I cannot ignore the repeats).

In the example above, the function I would call would tell me that a and b are ~90% similar for example. How can I do that? Edit distance was something which came to mind. I know how to use it with strings but I'm not sure how to use it with a list of ints. Thanks!

like image 297
aerain Avatar asked Jul 15 '11 15:07

aerain


People also ask

How do you find the similarity between two lists?

Using sum() ,zip() and len() This method first compares each element of the two lists and store those as summation of 1, which is then compared with the length of the other list. For this method, we have to first check if the lengths of both the lists are equal before performing this computation.

How do you find the cosine similarity between two lists?

We use the below formula to compute the cosine similarity. where A and B are vectors: A.B is dot product of A and B: It is computed as sum of element-wise product of A and B. ||A|| is L2 norm of A: It is computed as square root of the sum of squares of elements of the vector A.

How do you find the common number in two lists in Python?

Method 2:Using Set's intersection property Convert the list to set by conversion. Use the intersection function to check if both sets have any elements in common. If they have many elements in common, then print the intersection of both sets.


1 Answers

You can use the difflib module

ratio()
Return a measure of the sequences’ similarity as a float in the range [0, 1].

Which gives :

 >>> s1=[1,8,3,9,4,9,3,8,1,2,3]  >>> s2=[1,8,1,3,9,4,9,3,8,1,2,3]  >>> sm=difflib.SequenceMatcher(None,s1,s2)  >>> sm.ratio()  0.9565217391304348 
like image 186
kraymer Avatar answered Oct 17 '22 08:10

kraymer