Renaming column names of a DataFrame in Spark Scala

I am trying to convert all the headers / column names of a DataFrame in Spark-Scala. as of now I come up with following code which only replaces a single column name.

for( i <- 0 to origCols.length - 1) {   df.withColumnRenamed(     df.columns(i),      df.columns(i).toLowerCase   ); }

How would you rename the columns of a DataFrame?

Method 1: Using rename() function One way of renaming the columns in a Pandas Dataframe is by using the rename() function. This method is quite useful when we need to rename some selected columns because we need to specify information only for the columns which are to be renamed.

Can we rename the column values of data frame?

Renaming Columns of an Existing Dataframe To rename the columns of this DataFrame , we can use the rename() method which takes: A dictionary as the columns argument containing the mapping of original column names to the new column names as a key-value pairs.

If structure is flat:

val df = Seq((1L, "a", "foo", 3.0)).toDF df.printSchema // root //  |-- _1: long (nullable = false) //  |-- _2: string (nullable = true) //  |-- _3: string (nullable = true) //  |-- _4: double (nullable = false)

the simplest thing you can do is to use toDF method:

val newNames = Seq("id", "x1", "x2", "x3") val dfRenamed = df.toDF(newNames: _*)  dfRenamed.printSchema // root // |-- id: long (nullable = false) // |-- x1: string (nullable = true) // |-- x2: string (nullable = true) // |-- x3: double (nullable = false)

If you want to rename individual columns you can use either select with alias:

df.select($"_1".alias("x1"))

which can be easily generalized to multiple columns:

val lookup = Map("_1" -> "foo", "_3" -> "bar")  df.select(df.columns.map(c => col(c).as(lookup.getOrElse(c, c))): _*)

or withColumnRenamed:

df.withColumnRenamed("_1", "x1")

which use with foldLeft to rename multiple columns:

lookup.foldLeft(df)((acc, ca) => acc.withColumnRenamed(ca._1, ca._2))

With nested structures (structs) one possible option is renaming by selecting a whole structure:

val nested = spark.read.json(sc.parallelize(Seq(     """{"foobar": {"foo": {"bar": {"first": 1.0, "second": 2.0}}}, "id": 1}""" )))  nested.printSchema // root //  |-- foobar: struct (nullable = true) //  |    |-- foo: struct (nullable = true) //  |    |    |-- bar: struct (nullable = true) //  |    |    |    |-- first: double (nullable = true) //  |    |    |    |-- second: double (nullable = true) //  |-- id: long (nullable = true)  @transient val foobarRenamed = struct(   struct(     struct(       $"foobar.foo.bar.first".as("x"), $"foobar.foo.bar.first".as("y")     ).alias("point")   ).alias("location") ).alias("record")  nested.select(foobarRenamed, $"id").printSchema // root //  |-- record: struct (nullable = false) //  |    |-- location: struct (nullable = false) //  |    |    |-- point: struct (nullable = false) //  |    |    |    |-- x: double (nullable = true) //  |    |    |    |-- y: double (nullable = true) //  |-- id: long (nullable = true)

Note that it may affect nullability metadata. Another possibility is to rename by casting:

nested.select($"foobar".cast(   "struct<location:struct<point:struct<x:double,y:double>>>" ).alias("record")).printSchema  // root //  |-- record: struct (nullable = true) //  |    |-- location: struct (nullable = true) //  |    |    |-- point: struct (nullable = true) //  |    |    |    |-- x: double (nullable = true) //  |    |    |    |-- y: double (nullable = true)

or:

import org.apache.spark.sql.types._  nested.select($"foobar".cast(   StructType(Seq(     StructField("location", StructType(Seq(       StructField("point", StructType(Seq(         StructField("x", DoubleType), StructField("y", DoubleType))))))))) ).alias("record")).printSchema  // root //  |-- record: struct (nullable = true) //  |    |-- location: struct (nullable = true) //  |    |    |-- point: struct (nullable = true) //  |    |    |    |-- x: double (nullable = true) //  |    |    |    |-- y: double (nullable = true)

Renaming column names of a DataFrame in Spark Scala

Tags:

dataframe

scala

apache-spark

apache-spark-sql

Sam

People also ask

1 Answers

zero323

Recent Activity

Donate For Us

Renaming column names of a DataFrame in Spark Scala

Tags:

dataframe

scala

apache-spark

apache-spark-sql

Sam

People also ask

1 Answers

zero323

Related questions

Recent Activity

Donate For Us