I'm trying to figure out an algorithm that gives a measure of similarity between two lists, each having n distinct elements. The two lists are basically different arrangements of the same n elements.
Method : Using "|" operator + "&" operator + set() The method which is formally applied to calculate the similarity among lists is finding the distinct elements and also common elements and computing it's quotient. The result is then multiplied by 100, to get the percentage.
To calculate the similarity between two examples, you need to combine all the feature data for those two examples into a single numeric value. For instance, consider a shoe data set with only one feature: shoe size. You can quantify how similar two shoes are by calculating the difference between their sizes.
We use the below formula to compute the cosine similarity. where A and B are vectors: A.B is dot product of A and B: It is computed as sum of element-wise product of A and B. ||A|| is L2 norm of A: It is computed as square root of the sum of squares of elements of the vector A.
One way would be to calculate an edit distance, i.e. the minimum number of modification steps to transform one list to the other. This would basically be the same as a Levenshtein or Damerau-Levenshtein distance, but instead of a string of characters, you're comparing a list of elements.
http://en.wikipedia.org/wiki/Levenshtein_distance
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With