How to change case of whole pyspark dataframe to lower or upper

Question

I am trying to apply pyspark sql functions hash algorithm for every row in two dataframes to identify the differences. Hash algorithm is case sensitive .i.e. if column contains 'APPLE' and 'Apple' are considered as two different values, so I want to change the case for both dataframes to either upper or lower. I am able to achieve only for dataframe headers but not for dataframe values.Please help

#Code for Dataframe column headers
self.df_db1 =self.df_db1.toDF(*[c.lower() for c in self.df_db1.columns])

Steven · Accepted Answer

Assuming df is your dataframe, this should do the work:

from pyspark.sql import functions as F
for col in df.columns:
    df = df.withColumn(col, F.lower(F.col(col)))

How to change case of whole pyspark dataframe to lower or upper

Tags:

python-3.x

case-sensitive

apache-spark

pyspark

spark-dataframe

Jack

1 Answers

Steven

Recent Activity

Donate For Us

How to change case of whole pyspark dataframe to lower or upper

Tags:

python-3.x

case-sensitive

apache-spark

pyspark

spark-dataframe

Jack

1 Answers

Steven

Related questions

Recent Activity

Donate For Us