Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exact match of string in pandas python

I have a column in data frame which ex df:

  A
0 Good to 1. Good communication EI : [email protected]
1 SAP ECC Project System  EI: [email protected]
2 EI : ravikumar.swarna  Role:SSE  Minimum Skill  

I have a list of of strings

ls=['[email protected]','[email protected]']

Now if i want to filter out

for i in range(len(ls)):
   df1=df[df['A'].str.contains(ls[i])
        if len(df1.columns!=0):
            print ls[i]

I get the output

[email protected] 
[email protected]

But I need only [email protected]

How Can It be achieved? As you can see I've tried str.contains But I need something for extact match

like image 706
Abul Avatar asked May 30 '17 06:05

Abul


3 Answers

You could simply use ==

string_a == string_b

It should return True if the two strings are equal. But this does not solve your issue.

Edit 2: You should use len(df1.index) instead of len(df1.columns). Indeed, len(df1.columns) will give you the number of columns, and not the number of rows.

Edit 3: After reading your second post, I've understood your problem. The solution you propose could lead to some errors. For instance, if you have:

ls=['[email protected]','[email protected]', '[email protected]']

the first and the third element will match str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]) And this is an unwanted behaviour.

You could add a check on the end of the string: str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]+r'(?:\s|$)')

Like this:

for i in range(len(ls)):
  df1 = df[df['A'].str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]+r'(?:\s|$)')]
  if len(df1.index != 0):
      print (ls[i])

(Remove parenthesis in the "print" if you use python 2.7)

like image 98
L. Perez Avatar answered Nov 10 '22 09:11

L. Perez


Thanks for the help. But seems like I found a solution that is working as of now.

Must use str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]) This seems to solve the problem.

Although thanks to @IsaacDj for his help.

like image 35
Abul Avatar answered Nov 10 '22 09:11

Abul


Why not just use:

df1 = df[df['A'].[str.match][1](ls[i])

It's the equivalent of regex match.

like image 21
misantroop Avatar answered Nov 10 '22 10:11

misantroop