Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do exact string match while filtering from pandas dataframe

Tags:

python

pandas

I have a dataframe as

df

   indx   pids
    A    181718,
    B     31718,
    C      1718, 
    D    1235,3456
    E    890654,

I want to return a row that matches 1718 exactly.

I tried to do this but as expected it returns rows where the 1718 is subset as well:

group_df = df.loc[df['pids'].astype(str).str.contains('{},'.format(1718)), 'pids']

   indx   pids
    A    181718,
    B     31718,
    C      1718, 

When I try to do something like this, it returns empty:

cham_geom = df.loc[df['pids'] == '1718', 'pids']

Expected output:

 indx   pids
  C      1718, 

Can anyone help me with it?

like image 247
Atihska Avatar asked Jan 27 '23 15:01

Atihska


1 Answers

you can try with:

df[df.pids.replace('\D','',regex=True).eq('1718')]

  indx   pids
2    C  1718,

'\D' : Any character that is not a numeric digit from 0 to 9.

EDIT Considering the below df:

  indx       pids
0    A    181718,
1    B     31718,
2    C      1718,
3    D  1235,3456
4    E    890654,
5    F  3220,1718

executing:

df[df.pids.str.split(",").apply(lambda x: '1718' in x)]
#if not comma only:-> df[df.pids.str.split("\D").apply(lambda x: '1718' in x)]

Gives:

  indx       pids
2    C      1718,
5    F  3220,1718
like image 131
anky Avatar answered Jan 30 '23 00:01

anky