Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to rename a column for a dataframe in pyspark?

below is part code:

df = None

F_DATE = ['202101', '202102', '202103']

for date in F_DATE:
    if df is None:
        df = spark.sql("select count(*) as Total_count from test_" + date)
    else:
        df2 = spark.sql("select count(*) as Total_count from test_" + date)
        df = df.union(df2)

df.write.csv('/csvs/test.csv')

I tried 'toDF()', 'withColumnRenamed()', and 'selectExpr()', but the column name was not changed.

NOTE. Use the table in Hive.

ADD I've never used "df.show()" to write code, and I've used "df.show()" to read code. When used "df.show()" in write code, it was confirmed that the column name came out well, and when used "df.show()" in read code, it was confirmed that the column name did not come out properly.

like image 847
SecY Avatar asked Oct 19 '25 01:10

SecY


1 Answers

You can use:

df = df.withColumnRenamed('old_name', 'new_name')
like image 151
lucaspompeun Avatar answered Oct 21 '25 14:10

lucaspompeun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!