Dynamically rename multiple columns in PySpark DataFrame

Question

I have a dataframe in pyspark which has 15 columns.

The column name are id, name, emp.dno, emp.sal, state, emp.city, zip .....

Now I want to replace the column names which have '.' in them to '_'

Like 'emp.dno' to 'emp_dno'

I would like to do it dynamically

How can I achieve that in pyspark?

MaxU - stop WAR against UA · Accepted Answer

You can use something similar to this great solution from @zero323:

df.toDF(*(c.replace('.', '_') for c in df.columns))

alternatively:

from pyspark.sql.functions import col

replacements = {c:c.replace('.','_') for c in df.columns if '.' in c}

df.select([col(c).alias(replacements.get(c, c)) for c in df.columns])

The replacement dictionary then would look like:

{'emp.city': 'emp_city', 'emp.dno': 'emp_dno', 'emp.sal': 'emp_sal'}

UPDATE:

if I have dataframe with space in column names also how do replace both '.' and space with '_'

import re

df.toDF(*(re.sub(r'[\.\s]+', '_', c) for c in df.columns))

Zilong Z · Answer

Wrote an easy & fast function for you to use. Enjoy! :)

def rename_cols(rename_df):
    for column in rename_df.columns:
        new_column = column.replace('.','_')
        rename_df = rename_df.withColumnRenamed(column, new_column)
    return rename_df

Dynamically rename multiple columns in PySpark DataFrame

Tags:

dataframe

special-characters

apache-spark

pyspark

User12345

2 Answers

MaxU - stop WAR against UA

Zilong Z

Recent Activity

Donate For Us

Dynamically rename multiple columns in PySpark DataFrame

Tags:

dataframe

special-characters

apache-spark

pyspark

User12345

2 Answers

MaxU - stop WAR against UA

Zilong Z

Related questions

Recent Activity

Donate For Us