I would like all columns to be named in a uniform manner, like:
Last Name -> LAST_NAME
e-mail -> E_MAIL
ZIP code 2 -> ZIP_CODE_2
For that purpose I wrote a function that uppercases all symbols, keeps digits and replaces rest of the characters with underscore ('_'
). Then it replaces multiple underscores with just one and trims underscores at both ends.
How do I apply this function (lambda) to the column names in Pandas?
You can do this without using apply
by calling the vectorised str
methods:
In [62]:
df = pd.DataFrame(columns=['Last Name','e-mail','ZIP code 2'])
df.columns
Out[62]:
Index(['Last Name', 'e-mail', 'ZIP code 2'], dtype='object')
In [63]:
df.columns = df.columns.str.upper().str.replace(' ','_')
df.columns
Out[63]:
Index(['LAST_NAME', 'E-MAIL', 'ZIP_CODE_2'], dtype='object')
Otherwise you can convert the Index
object to a Series
using to_series
so you can use apply
:
In [67]:
def func(x):
return x.upper().replace(' ','_')
df.columns = df.columns.to_series().apply(func)
df
Out[67]:
Empty DataFrame
Columns: [LAST_NAME, E-MAIL, ZIP_CODE_2]
Index: []
Thanks to @PaulH for suggesting using rename
with a lambda
:
In [68]:
df.rename(columns=lambda c: c.upper().replace(' ','_'), inplace=True)
df.columns
Out[68]:
Index(['LAST_NAME', 'E-MAIL', 'ZIP_CODE_2'], dtype='object')
You can simply set the .columns
property of the data frame. So in order to rename it, you can use:
df.columns = list(map(yourlambda,df.columns))
Where you of course replace yourlambda
with your lambda expression.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With