I have a list of names in which I have made all uppercase, removed spaces, and non-alphabetic characters to more easily merge with another list -- both are in pandas dataframe.
One of the dataframe's names have some names with JR
attached to the end while their counterparts in the other dataframe to not contain this suffix. How can I strip all JR
from both?
I tried something like the following:
df['NAME'] = df['NAME'].str.replace('JR','')
but I think this would remove all instances of JR
and not when it is the last 2 characters. Any help would be appreciated.
Use the . rstrip() method to remove whitespace and characters only from the end of a string.
Using 'str. replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.
You could use replace with a regex:
import pandas as pd
df = pd.DataFrame(data=['Name JR', 'Name JR Middle', 'JR Name'], columns=['name'])
df['name'] = df.name.str.replace(r'\bJR$', '', regex=True).str.strip()
print(df)
Output
name
0 Name
1 Name JR Middle
2 JR Name
The pattern '\bJR$'
matches the word JR only at the end of the string.
You need:
def jr_replace(x):
match = re.sub(r'JR$',"",x)
return match
df['NAME'] = df['NAME'].apply(jr_replace)
print(df)
One option is to remove JR
using string.endswith
, and remove it from the rows that contain it sclicing the str
object:
m = s.str.endswith('JR')
s.loc[m] = s.loc[m].str[:-2]
Example
Using @danielmesejo's dataframe:
df = pd.DataFrame(data=['Name JR', 'Name JR Middle', 'JR Name'], columns=['name'])
m = df.name.str.endswith('JR')
df.name.loc[m] = df.name.loc[m].str[:-2]
name
0 Name
1 Name JR Middle
2 JR Name
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With