I have a dataframe with multiple string columns. I want to use a string method that is valid for a series on multiple columns of the dataframe. Something like this is what I would wish for:
df = pd.DataFrame({'A': ['123f', '456f'], 'B': ['789f', '901f']})
df
Out[15]:
A B
0 123f 789f
1 456f 901f
df = df.str.rstrip('f')
df
Out[16]:
A B
0 123 789
1 456 901
Obviously, this doesn't work because str operations are only valid on pandas Series objects. What is the appropriate/most pandas-y method to do this?
Function rstrip
working with Series
so is possible use apply
:
df = df.apply(lambda x: x.str.rstrip('f'))
Or create Series
by stack
and last unstack
:
df = df.stack().str.rstrip('f').unstack()
Or use applymap
:
df = df.applymap(lambda x: x.rstrip('f'))
Last if need apply function to some columns:
#add columns to lists
cols = ['A']
df[cols] = df[cols].apply(lambda x: x.str.rstrip('f'))
df[cols] = df[cols].stack().str.rstrip('f').unstack()
df[cols] = df[cols].stack().str.rstrip('f').unstack()
You can mimic the behavior of rstrip
using replace
with regex=True
, which can be applied to the entire DataFrame
:
df.replace(r'f$', '', regex=True)
A B
0 123 789
1 456 901
Since rstrip
takes a sequence of characters to strip, you can easily extend this:
df.replace(r'[abc]+$', '', regex=True)
You can use a dictionary comprehension and feed to the pd.DataFrame
constructor:
res = pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})
Currently, the Pandas str
methods are inefficient. Regex is even more inefficient, but more easily extendible. As always, you should test with your data.
# Benchmarking on Python 3.6.0, Pandas 0.19.2
def jez1(df):
return df.apply(lambda x: x.str.rstrip('f'))
def jez2(df):
return df.applymap(lambda x: x.rstrip('f'))
def jpp(df):
return pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})
def user3483203(df):
return df.replace(r'f$', '', regex=True)
df = pd.concat([df]*10000)
%timeit jez1(df) # 33.1 ms per loop
%timeit jez2(df) # 29.9 ms per loop
%timeit jpp(df) # 13.2 ms per loop
%timeit user3483203(df) # 42.9 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With