Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to rename column names in spark SQL

I have a dataframe with configurable column names eg

Journey channelA channelB channelC
j1      1        0        0
j1      0        1        0
j1      1        0        0
j2      0        0        1 
j2      0        1        0

By configurable I mean there could be 'n' channels in the dataframe.

Now I need to a transformation in which I need to find the sum of all channels something like

df.groupBy("Journey").agg(sum("channelA"), sum("channelB"), sum("channelC"))

The output of which would be :

Journey sum(channelA) sum(channelB) sum(channelC)
j1      2             1             0
j2      0             1             1

Now i want to rename the column names to the original names and I could do it with

.withColumnRenamed("sum(channelA)", channelA)

but as i mentioned the channel list is configurable and I would want a generic column rename statement to rename all my summed columns to the original column names to get an expected dataframe as :

Journey channelA channelB channelC
j1      2        1             0
j2      0        1             1

Any suggestions how to approach this

like image 971
hbabbar Avatar asked Sep 29 '16 02:09

hbabbar


2 Answers

To rename dinamically your DataFrame's columns you can use the method toDF(scala.collection.Seq colNames), with whitch you can populate dinamically colNames with the original column names.

So you can populate dinamically a sequence like this:

val columnsRenamed = Seq("Journey", "channelA", "channelB","channelC") 

and then call the method toDF:

df = df.toDF(columnsRenamed: _*)

The reason of : _* operator is to cast form Seq[String] to String*.

like image 120
Umberto Griffo Avatar answered Oct 16 '22 19:10

Umberto Griffo


It could also be renamed in the following ways, Say the input df is of the form inputDf: DataFrame with columns _1, _2.

val newDf = inputDf.selectExpr("_1 as x1", "_2 as X2")
* as -> maps to alias

Other detailed answers could be found here: Renaming Column names of a Data frame in spark scala

like image 20
Pramit Avatar answered Oct 16 '22 20:10

Pramit