I recently found out about the str
method for Pandas series and it's great! However if I want to chain operations (say, a couple replace
and a strip
) I need to keep calling str
after every operation, making it not the most elegant code.
For example, lets say my column names contain spaces and periods and I want to replace them by underscores. I might also want to strip any leftover underscores. If I wanted to do this using str
methods, is there any way of avoiding having to run:
df.columns.str.replace(' ', '_').str.replace('.', '_').str.strip('_')
Thanks!
Pandas chaining is an alternative to variable assignment when transforming data. Those in favor of chaining argue that the code is easier to read because it lays out the execution of the transformation like a recipe.
Python | Pandas Series.str.cat() to concatenate string Pandas str.cat() is used to concatenate strings to the passed caller series of string. Distinct values from a different series can be passed but the length of both the series has to be same. .
Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).
I think need str
repeat for each .str
function, it is per design.
But here is possible use only one replace
:
df = pd.DataFrame(columns=['aa dd', 'dd.d_', 'd._'])
print (df)
Empty DataFrame
Columns: [aa dd, dd.d_, d._]
Index: []
print (df.columns.str.replace('[\s+.]', '_').str.strip('_'))
Index(['aa_dd', 'dd_d', 'd'], dtype='object')
Why not use a list comprehension?
import re
df.columns = [re.sub('[\s.]', '_', x).strip('_') for x in df.columns]
In a list comp, you're working with the string object directly, without the need to call .str
each time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With