Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Count the number of substring in list from other string list without duplicates

I have two list:

main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']

I want to count the number of times I find a string from master_list in a string of main_list without counting two times the same item.

Example: for the two lists above, the result of my function should be 4. 'Smith' can be retrieved 3 times in main_list. 'Roger can be found 2 times but as 'Smith' was already found in 'Roger-Smith', this one doesn't count anymore, so 'Roger' is just count as 1 which make 4 in total.

The function I wrote for know is below but I think there is a faster way to do it:

def string_detection(master_list, main_list):
    count = 0
    for substring in master_list:
        temp = list(main_list)
        for string in temp:
            if substring in string:
                main_list.remove(string)
                count+=1
    return count
like image 210
erwanlc Avatar asked Feb 16 '17 10:02

erwanlc


4 Answers

A one liner

>>>sum(any(m in L for m in master_list) for L in main_list)
4

Iterate over main_list and check if any of the values from master_list are in that string. This leaves you with a list of bool values. It will stop after it finds one and so adds only one to the count for each string. Conveniently sum counts all the Trues to give you the count.

like image 122
Paul Rooney Avatar answered Oct 03 '22 09:10

Paul Rooney


You can use pandas (which provide fast vectorized operations) with str.contains and sum()

import pandas as pd
main_list = pd.Series(['Smith', 'Smith', 'Roger', 'Roger-Smith', '42'])
master_list = ['Smith', 'Roger']
count = main_list.str.contains('|'.join(master_list)).sum()
like image 44
Yuval Atzmon Avatar answered Oct 03 '22 09:10

Yuval Atzmon


You can do it other way around. Create list that will contain only elements from main_list that have substring from master_list

temp_list = [ string for string in main_list if any(substring in string for substring in master_list)]

Now temp_list looks like this:

['Smith', 'Smith', 'Roger', 'Roger-Smith']

So the length of temp_list is your answer.

like image 39
Yevhen Kuzmovych Avatar answered Oct 03 '22 08:10

Yevhen Kuzmovych


What about this

main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']

print len([word for word in main_list if any(mw in word for mw in master_list)])
like image 39
Elmex80s Avatar answered Oct 03 '22 07:10

Elmex80s