Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Test if exact string appears in a Pandas Series

In a Pandas df['Column'] (i.e. a pandas Series)

If I use

df['company_name'].str.contains('ABC').any()

I will get 'True' if an entry is "ABC"

But it will also return a (false positive) "True" if some other entry in the Series is "ABC PTY LTD"

I only want to match if there is an entry that is exactly "ABC"

I've checked about 50 similar questions but none answer this one.

I tried a Regex

rec_df['recruiters'].str.match( r'^ABC$').any()

It works but the problem is I want to pass the 'ABC' part into the regex as a variable and I can't work out how.

Any help for a NooB who trying to learn please?

Any solution that would match a record with exactly 'ABC' and not a longer string like 'ABC Pty Ltd' and not a substring like 'AB" would be idea

like image 856
Axle Max Avatar asked Aug 07 '18 13:08

Axle Max


People also ask

How do you check if a string is in a series Pandas?

str. contains() function is used to test if pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

How do you match an exact string in Python?

Exact match (equality comparison): == , != As with numbers, the == operator determines if two strings are equal. If they are equal, True is returned; if they are not, False is returned. It is case-sensitive, and the same applies to comparisons by other operators and methods.

How do I find a specific string in a DataFrame?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.

Can you use ILOC on a series?

iloc attribute enables purely integer-location based indexing for selection by position over the given Series object. Example #1: Use Series. iloc attribute to perform indexing over the given Series object.


2 Answers

You can do

df['company_name'].eq('ABC').any() #(df['company_name']=='ABC').any()
like image 110
BENY Avatar answered Oct 03 '22 01:10

BENY


Thanks to @Wen for the answer. I also worked out the Regex approach in case anyone needs it.

company_name = 'ABC'

item = r'^' + company_name + '$' 

df[‘company’].str.match(item).any()
like image 39
Axle Max Avatar answered Oct 03 '22 00:10

Axle Max