Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3 - counting matches in two lists (including duplicates)

First of all, I'm new to programming and python, I've looked here but can't find a solution, if this is a stupid question though please forgive me!

I have two lists and I'm trying to determine how many times items in the second list appears in the first list.

I have the following solution:

    list1 = ['black','red','yellow']
    list2 = ['the','big','black','dog']
    list3 = ['the','black','black','dog']
    p = set(list1)&set(list2)
    print(len(p))

It works fine apart from when the second list contains duplicates.

i.e. list1 and list2 above returns 1, but so does list1 and list3, when ideally that should return 2

Can anyone suggest a solution to this? Any help would be appreciated!

Thanks,

Adam

like image 614
AdamDynamic Avatar asked Nov 10 '12 16:11

AdamDynamic


3 Answers

You're seeing this problem because of you're using sets for your collection type. Sets have two characteristics: they're unordered (which doesn't matter here), and their elements are unique. So you're losing the duplicates in the lists when you convert them to sets, before you even find their intersection:

>>> p = ['1', '2', '3', '3', '3', '3', '3']
>>> set(p)
set(['1', '2', '3'])

There are several ways you can do what you're looking to do here, but you'll want to start by looking at the list count method. I would do something like this:

>>> list1 = ['a', 'b', 'c']
>>> list2 = ['a', 'b', 'c', 'c', 'c']
>>> results = {}
>>> for i in list1:
        results[i] = list2.count(i) 
>>> results
{'a': 1, 'c': 3, 'b': 1}

This approach creates a dictionary (results), and for each element in list1, creates a key in results, counts the times it occurs in list2, and assigns that to the key's value.

Edit: As Lattyware points out, that approach solves a slightly different question than the one you asked. A really fundamental solution would look like this

>>> words = ['red', 'blue', 'yellow', 'black']
>>> list1 = ['the', 'black', 'dog']
>>> list2 = ['the', 'blue', 'blue', 'dog']
>>> results1 = 0
>>> results2 = 0
>>> for w in words:
        results1 += list1.count(w)
        results2 += list2.count(w)

>>> results1
1
>>> results2
2

This works in a similar way to my first suggestion: it iterates through each word in your main list (here I use words), adds the number of times it appears in list1 to the counter results1, and list2 to results2.

If you need more information than just the number of duplicates, you'll want to use a dictionary or, even better, the specialized Counter type in the collections modules. Counter is built to make everything I did in the examples above easy.

>>> from collections import Counter
>>> results3 = Counter()
>>> for w in words:
        results3[w] = list2.count(w)

>>> results3
Counter({'blue': 2, 'black': 0, 'yellow': 0, 'red': 0})
>>> sum(results3.values())
2
like image 86
toxotes Avatar answered Sep 28 '22 01:09

toxotes


Shouldn't list 1 and list 2 return 0? Or did you mean

list1 = ['black', 'red', 'yellow']

What you want, I think, is

print(len([w for w in list2 if w in list1]))

The trouble with using sets is that a set have no duplicates. In fact, the usual reason for using a set is to eliminate duplicates. That's just what you don't want here, of course.

like image 39
saulspatz Avatar answered Sep 27 '22 23:09

saulspatz


I know this is an old question, but if anyone was wondering how to get matches or the length of the matches from one or more lists. you can do this as well.

a = [1,2,3]
b = [2,3,4]
c = [2,4,5]

To get matches in two lists, say a and b will be

d = [value for value in a if value in b] # 2,3 

For the three lists, will be

d = [value for value in a if value in b and value in c] # 2
len(d) # to get the number of matches

also, if you need to handle duplicates. it will be a matter of converting the list to a set beforehand e.g

a  = set(a) # and so on
like image 39
Chidi Avatar answered Sep 28 '22 01:09

Chidi