I have a dataframe in pyspark which has 15 columns.
The column name are id
, name
, emp.dno
, emp.sal
, state
, emp.city
, zip
.....
Now I want to replace the column names which have '.'
in them to '_'
Like 'emp.dno'
to 'emp_dno'
I would like to do it dynamically
How can I achieve that in pyspark?
You can use something similar to this great solution from @zero323:
df.toDF(*(c.replace('.', '_') for c in df.columns))
alternatively:
from pyspark.sql.functions import col
replacements = {c:c.replace('.','_') for c in df.columns if '.' in c}
df.select([col(c).alias(replacements.get(c, c)) for c in df.columns])
The replacement
dictionary then would look like:
{'emp.city': 'emp_city', 'emp.dno': 'emp_dno', 'emp.sal': 'emp_sal'}
UPDATE:
if I have dataframe with space in column names also how do replace both
'.'
and space with'_'
import re
df.toDF(*(re.sub(r'[\.\s]+', '_', c) for c in df.columns))
Wrote an easy & fast function for you to use. Enjoy! :)
def rename_cols(rename_df):
for column in rename_df.columns:
new_column = column.replace('.','_')
rename_df = rename_df.withColumnRenamed(column, new_column)
return rename_df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With