I am new to Python and have a question regarding matching strings in a list to a column in a df.
When I run the following commands, I would like a new column named "Match" to be created, and if there is a match between the string in the list and the string in the column the value in the "Match" column and corresponding row should be True, if no match, then False. The desired outcome would be False, False, True, False, False. Since the string "Honda" is not an exact match of "Honda Civic" it should not be True. Same with "Toy" is not the exact match of "Toyota Corolla".
Creating df:
Cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4', np.nan],
'Price': [22000,25000,27000,35000, 29000],
'Liscence Plate': ['ABC 123', 'XYZ 789', 'CBA 321', 'ZYX 987', 'DEF 456']}
df = DataFrame(Cars,columns= ['Brand', 'Price', 'Liscence Plate'])
I then create a list of the values I would like to search for, joined by |.
search_for_these_values = ['Honda', 'Toy', 'Ford Focus', 'Audi A4 2019']
pattern = '|'.join(search_for_these_values)
Here I have tried the str.match command and are given True, True, True, False, False.
df['Match'] = df["Brand"].str.match(pattern, na=False)
Here I have created a loop using the == operator and are given False, False, False, False, False.
for i in range(0,len(pattern)):
df['Match'] = df['Brand'] == pattern[i]
Thank you for the help!
If need match values in list, use Series.isin
:
df['Match'] = df["Brand"].isin(search_for_these_values)
print (df)
Brand Price Liscence Plate Match
0 Honda Civic 22000 ABC 123 False
1 Toyota Corolla 25000 XYZ 789 False
2 Ford Focus 27000 CBA 321 True
3 Audi A4 35000 ZYX 987 False
4 NaN 29000 DEF 456 False
Solution with match
is used for check substrings, so different output.
Alternative solution for match substrings with Series.str.contains
and parameter na=False
:
df['Match'] = df["Brand"].str.contains(pattern, na=False)
print (df)
Brand Price Liscence Plate Match
0 Honda Civic 22000 ABC 123 True
1 Toyota Corolla 25000 XYZ 789 True
2 Ford Focus 27000 CBA 321 True
3 Audi A4 35000 ZYX 987 False
4 NaN 29000 DEF 456 False
EDIT:
For test values in substrings is possible use list comprehension with loop by values in search_for_these_values
and test match by in
with any
for return at least one True
:
df['Match'] = [any(x in z for z in search_for_these_values)
if x == x
else False
for x in df["Brand"]]
print (df)
Brand Price Liscence Plate Match
0 Honda Civic 22000 ABC 123 False
1 Toyota Corolla 25000 XYZ 789 False
2 Ford Focus 27000 CBA 321 True
3 Audi A4 35000 ZYX 987 True
4 NaN 29000 DEF 456 False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With