First of all, I'm new to programming and python, I've looked here but can't find a solution, if this is a stupid question though please forgive me!
I have two lists and I'm trying to determine how many times items in the second list appears in the first list.
I have the following solution:
list1 = ['black','red','yellow']
list2 = ['the','big','black','dog']
list3 = ['the','black','black','dog']
p = set(list1)&set(list2)
print(len(p))
It works fine apart from when the second list contains duplicates.
i.e. list1 and list2 above returns 1, but so does list1 and list3, when ideally that should return 2
Can anyone suggest a solution to this? Any help would be appreciated!
Thanks,
Adam
You're seeing this problem because of you're using sets for your collection type. Sets have two characteristics: they're unordered (which doesn't matter here), and their elements are unique. So you're losing the duplicates in the lists when you convert them to sets, before you even find their intersection:
>>> p = ['1', '2', '3', '3', '3', '3', '3']
>>> set(p)
set(['1', '2', '3'])
There are several ways you can do what you're looking to do here, but you'll want to start by looking at the list count
method. I would do something like this:
>>> list1 = ['a', 'b', 'c']
>>> list2 = ['a', 'b', 'c', 'c', 'c']
>>> results = {}
>>> for i in list1:
results[i] = list2.count(i)
>>> results
{'a': 1, 'c': 3, 'b': 1}
This approach creates a dictionary (results
), and for each element in list1
, creates a key in results
, counts the times it occurs in list2
, and assigns that to the key's value.
Edit: As Lattyware points out, that approach solves a slightly different question than the one you asked. A really fundamental solution would look like this
>>> words = ['red', 'blue', 'yellow', 'black']
>>> list1 = ['the', 'black', 'dog']
>>> list2 = ['the', 'blue', 'blue', 'dog']
>>> results1 = 0
>>> results2 = 0
>>> for w in words:
results1 += list1.count(w)
results2 += list2.count(w)
>>> results1
1
>>> results2
2
This works in a similar way to my first suggestion: it iterates through each word in your main list (here I use words
), adds the number of times it appears in list1
to the counter results1
, and list2
to results2
.
If you need more information than just the number of duplicates, you'll want to use a dictionary or, even better, the specialized Counter
type in the collections
modules. Counter is built to make everything I did in the examples above easy.
>>> from collections import Counter
>>> results3 = Counter()
>>> for w in words:
results3[w] = list2.count(w)
>>> results3
Counter({'blue': 2, 'black': 0, 'yellow': 0, 'red': 0})
>>> sum(results3.values())
2
Shouldn't list 1 and list 2 return 0? Or did you mean
list1 = ['black', 'red', 'yellow']
What you want, I think, is
print(len([w for w in list2 if w in list1]))
The trouble with using sets is that a set have no duplicates. In fact, the usual reason for using a set is to eliminate duplicates. That's just what you don't want here, of course.
I know this is an old question, but if anyone was wondering how to get matches or the length of the matches from one or more lists. you can do this as well.
a = [1,2,3]
b = [2,3,4]
c = [2,4,5]
To get matches in two lists, say a and b will be
d = [value for value in a if value in b] # 2,3
For the three lists, will be
d = [value for value in a if value in b and value in c] # 2
len(d) # to get the number of matches
also, if you need to handle duplicates. it will be a matter of converting the list to a set beforehand e.g
a = set(a) # and so on
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With