I have a dataframe with configurable column names eg
Journey channelA channelB channelC
j1 1 0 0
j1 0 1 0
j1 1 0 0
j2 0 0 1
j2 0 1 0
By configurable I mean there could be 'n' channels in the dataframe.
Now I need to a transformation in which I need to find the sum of all channels something like
df.groupBy("Journey").agg(sum("channelA"), sum("channelB"), sum("channelC"))
The output of which would be :
Journey sum(channelA) sum(channelB) sum(channelC)
j1 2 1 0
j2 0 1 1
Now i want to rename the column names to the original names and I could do it with
.withColumnRenamed("sum(channelA)", channelA)
but as i mentioned the channel list is configurable and I would want a generic column rename statement to rename all my summed columns to the original column names to get an expected dataframe as :
Journey channelA channelB channelC
j1 2 1 0
j2 0 1 1
Any suggestions how to approach this
To rename dinamically your DataFrame's columns you can use the method toDF(scala.collection.Seq colNames), with whitch you can populate dinamically colNames with the original column names.
So you can populate dinamically a sequence like this:
val columnsRenamed = Seq("Journey", "channelA", "channelB","channelC")
and then call the method toDF:
df = df.toDF(columnsRenamed: _*)
The reason of : _*
operator is to cast form Seq[String]
to String*
.
It could also be renamed in the following ways, Say the input df is of the form inputDf: DataFrame with columns _1, _2.
val newDf = inputDf.selectExpr("_1 as x1", "_2 as X2")
* as -> maps to alias
Other detailed answers could be found here: Renaming Column names of a Data frame in spark scala
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With