Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove certain characters if on end of string in Pandas

Tags:

python

pandas

I have a list of names in which I have made all uppercase, removed spaces, and non-alphabetic characters to more easily merge with another list -- both are in pandas dataframe.

One of the dataframe's names have some names with JR attached to the end while their counterparts in the other dataframe to not contain this suffix. How can I strip all JR from both?

I tried something like the following:

df['NAME'] = df['NAME'].str.replace('JR','')

but I think this would remove all instances of JR and not when it is the last 2 characters. Any help would be appreciated.

like image 459
a.powell Avatar asked Feb 06 '19 15:02

a.powell


People also ask

How do I remove characters from the end of a string in Python?

Use the . rstrip() method to remove whitespace and characters only from the end of a string.

How do you get rid of a certain character in a string?

Using 'str. replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.


3 Answers

You could use replace with a regex:

import pandas as pd

df = pd.DataFrame(data=['Name JR', 'Name JR Middle', 'JR Name'], columns=['name'])
df['name'] = df.name.str.replace(r'\bJR$', '', regex=True).str.strip()

print(df)

Output

             name
0            Name
1  Name JR Middle
2         JR Name

The pattern '\bJR$' matches the word JR only at the end of the string.

like image 186
Dani Mesejo Avatar answered Oct 20 '22 00:10

Dani Mesejo


You need:

def jr_replace(x):
    match = re.sub(r'JR$',"",x)
    return match

df['NAME'] = df['NAME'].apply(jr_replace)

print(df)
like image 24
Sociopath Avatar answered Oct 19 '22 22:10

Sociopath


One option is to remove JR using string.endswith, and remove it from the rows that contain it sclicing the str object:

m = s.str.endswith('JR')
s.loc[m] = s.loc[m].str[:-2]

Example

Using @danielmesejo's dataframe:

df = pd.DataFrame(data=['Name JR', 'Name JR Middle', 'JR Name'], columns=['name'])
m = df.name.str.endswith('JR')
df.name.loc[m] =  df.name.loc[m].str[:-2]

            name
0           Name 
1  Name JR Middle
2         JR Name
like image 40
yatu Avatar answered Oct 19 '22 23:10

yatu