I have two lists:
eg. a = [1,8,3,9,4,9,3,8,1,2,3] and b = [1,8,1,3,9,4,9,3,8,1,2,3]
Both contain ints. There is no meaning behind the ints (eg. 1 is not 'closer' to 3 than it is to 8).
I'm trying to devise an algorithm to calculate the similarity between two ORDERED lists. Ordered is keyword right here (so I can't just take the set of both lists and calculate their set_difference percentage). Sometimes numbers do repeat (for example 3, 8, and 9 above, and I cannot ignore the repeats).
In the example above, the function I would call would tell me that a and b are ~90% similar for example. How can I do that? Edit distance was something which came to mind. I know how to use it with strings but I'm not sure how to use it with a list of ints. Thanks!
Using sum() ,zip() and len() This method first compares each element of the two lists and store those as summation of 1, which is then compared with the length of the other list. For this method, we have to first check if the lengths of both the lists are equal before performing this computation.
We use the below formula to compute the cosine similarity. where A and B are vectors: A.B is dot product of A and B: It is computed as sum of element-wise product of A and B. ||A|| is L2 norm of A: It is computed as square root of the sum of squares of elements of the vector A.
Method 2:Using Set's intersection property Convert the list to set by conversion. Use the intersection function to check if both sets have any elements in common. If they have many elements in common, then print the intersection of both sets.
You can use the difflib module
ratio()
Return a measure of the sequences’ similarity as a float in the range [0, 1].
Which gives :
>>> s1=[1,8,3,9,4,9,3,8,1,2,3] >>> s2=[1,8,1,3,9,4,9,3,8,1,2,3] >>> sm=difflib.SequenceMatcher(None,s1,s2) >>> sm.ratio() 0.9565217391304348
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With