I have an array of values called MyFruits like:
[apple, orange, banana, apple, pear]
Then I have a list of arrays like:
[apple, orange]
[blueberry, watermelon, pear]
[grape, orange, grape, orange]
[]
[cantaloupe]
For each of the arrays in the list, I want to get the count of elements that intersect with MyFruits array divided by the total number of elements in the array. So the output would be:
2 / 2 = 1
1 / 3 = 0.66667
2 / 4 = 0.5
0 / 0 = (in this case 0)
0 / 1 = 0
essentially:
[1, 0.66667, 0.5, 0, 0]
I've been doing this in Python with for loops, but the data set is huge and it's incredibly slow. Someone suggested using numpy, but I'm having difficulty understanding.
Suppose you have two list, one of length M and another of length N. If done by straightforward linear searches, it would take O(M * N) string comparisons to find which elements are in both lists.
You can improve on that using Python sets. Convert the lists to Python sets and use set intersection (&) to find their common elements. Then the complexity reduces to O(M + N).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With