Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Percentage Overlap of Two Lists

This is more of a math problem than anything else. Lets assume I have two lists of different sizes in Python

listA = ["Alice", "Bob", "Joe"]
listB = ["Joe", "Bob", "Alice", "Ken"]

I want to find out what percentage overlap these two lists have. Order is not important within the lists. Finding overlap is easy, I've seen other posts on how to do that but I can't quite extend it in my mind to finding out what percentage they overlap. If I compared the lists in different orders would the result come out differently? What would be the best way of doing this?

like image 839
OneManRiot Avatar asked Apr 28 '15 20:04

OneManRiot


People also ask

How do you calculate overlapping percentages?

Finally, to calculate the percentage of overlap, the area of overlap between the circles (a) is divided by the area of the small circle (Pi*r 2 ) and multiplied by 100: overlap output 1⁄4 (100*a) / (Pi*r 2 ).

How to find similarity between two list?

Method : Using "|" operator + "&" operator + set() The method which is formally applied to calculate the similarity among lists is finding the distinct elements and also common elements and computing it's quotient. The result is then multiplied by 100, to get the percentage.


2 Answers

From the principal point of view, I'd say that there are two sensible questions you might be asking:

  1. What percentage the overlap is if compared to the first list? I.e. how big is the common part in comparison to the first list?
  2. The same thing for the second list.
  3. What percentage the overlap is if compared to the "universe" (i.e. the union of both lists)?

There can surely be found other meanings as well and there would be many of them. All in all you should probably know what problem you're trying to solve.

From programming point of view, the solution is easy:

listA = ["Alice", "Bob", "Joe"]
listB = ["Joe", "Bob", "Alice", "Ken"]

setA = set(listA)
setB = set(listB)

overlap = setA & setB
universe = setA | setB

result1 = float(len(overlap)) / len(setA) * 100
result2 = float(len(overlap)) / len(setB) * 100
result3 = float(len(overlap)) / len(universe) * 100
like image 58
geckon Avatar answered Oct 12 '22 14:10

geckon


>>> len(set(listA)&set(listB)) / float(len(set(listA) | set(listB))) * 100
75.0

I would calculate the common items out of the total distinct items.

len(set(listA)&set(listB)) returns the common items (3 in your example).

len(set(listA) | set(listB)) returns the total number of distinct items (4).

Multiply by 100 and you get percentage.

like image 33
Ofiris Avatar answered Oct 12 '22 15:10

Ofiris