I have a dict containing lists under its keys:
dct = {'a': [1, 2, 3],
'b': [1, 2, 3, 4],
'c': [1, 2]}
What is the best way to recognize whether the length of the lists are the same or not?
This is my solution:
import itertools
len(set(itertools.imap(len, dct.viewvalues()))) == 1
True
if similar and False
if not
UPD: In reference to @RaymondHettinger advice replace map
to itertools.imap
The method which is formally applied to calculate the similarity among lists is finding the distinct elements and also common elements and computing it's quotient. The result is then multiplied by 100, to get the percentage.
Use the zip() function to pair up the lists, counting all the differences, then add the difference in length. The sum() sums up True and False values; this works because Python's boolean type is a subclass of int and False equals 0 , True equals 1 .
sort() and == operator. The list. sort() method sorts the two lists and the == operator compares the two lists item by item which means they have equal data items at equal positions. This checks if the list contains equal data item values but it does not take into account the order of elements in the list.
To convert this distance metric into the similarity metric, we can divide the distances of objects with the max distance, and then subtract it by 1 to score the similarity between 0 and 1. We will look at the example after discussing the cosine metric.
Your solution looks fine.
If you want to tweek it a bit, use itertools.imap() instead of map(). That will collapse the memory footprint to O(1) instead of O(n).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With