PySpark - rename more than one column using withColumnRenamed

Tags:

I want to change names of two columns using spark withColumnRenamed function. Of course, I can write:

data = sqlContext.createDataFrame([(1,2), (3,4)], ['x1', 'x2']) data = (data        .withColumnRenamed('x1','x3')        .withColumnRenamed('x2', 'x4'))

but I want to do this in one step (having list/tuple of new names). Unfortunately, neither this:

data = data.withColumnRenamed(['x1', 'x2'], ['x3', 'x4'])

nor this:

data = data.withColumnRenamed(('x1', 'x2'), ('x3', 'x4'))

is working. Is it possible to do this that way?

590

asked Aug 05 '16 22:08

user2280549

1 Answers

It is not possible to use a single withColumnRenamed call.

You can use DataFrame.toDF method*

data.toDF('x3', 'x4')

new_names = ['x3', 'x4'] data.toDF(*new_names)

It is also possible to rename with simple select:

from pyspark.sql.functions import col  mapping = dict(zip(['x1', 'x2'], ['x3', 'x4'])) data.select([col(c).alias(mapping.get(c, c)) for c in data.columns])

Similarly in Scala you can:

Rename all columns:

val newNames = Seq("x3", "x4")  data.toDF(newNames: _*)

Rename from mapping with select:

val  mapping = Map("x1" -> "x3", "x2" -> "x4")  df.select(   df.columns.map(c => df(c).alias(mapping.get(c).getOrElse(c))): _* )

or foldLeft + withColumnRenamed

mapping.foldLeft(data){   case (data, (oldName, newName)) => data.withColumnRenamed(oldName, newName)  }

* Not to be confused with RDD.toDF which is not a variadic functions, and takes column names as a list,

101

answered Oct 11 '22 06:10

zero323

Related questions
                            
                                Importing spark.implicits._ in scala
                            
                                Which operations preserve RDD order?
                            
                                Why does a job fail with "No space left on device", but df says otherwise?
                            
                                What is the difference between Apache Mahout and Apache Spark's MLlib?
                            
                                PySpark groupByKey returning pyspark.resultiterable.ResultIterable
                            
                                Median / quantiles within PySpark groupBy
                            
                                Upacking a list to select multiple columns from a spark data frame
                            
                                Apache Spark -- Assign the result of UDF to multiple dataframe columns
                            
                                PySpark: withColumn() with two conditions and three outcomes
                            
                                How to flatten a struct in a Spark dataframe?
                            
                                Automatically and Elegantly flatten DataFrame in Spark SQL
                            
                                How to split Vector into columns - using PySpark
                            
                                aggregate function Count usage with groupBy in Spark
                            
                                What are the various join types in Spark?
                            
                                How does Spark partition(ing) work on files in HDFS?
                            
                                How to melt Spark DataFrame?
                            
                                How to check Spark Version [closed]
                            
                                Generate a Spark StructType / Schema from a case class
                            
                                Spark functions vs UDF performance?
                            
                                How to access s3a:// files from Apache Spark?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PySpark - rename more than one column using withColumnRenamed

Tags:

rename

apache-spark

apache-spark-sql

pyspark

user2280549

People also ask

1 Answers

zero323

Recent Activity

Donate For Us