Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Dataframes: How can I change the order of columns in Java/Scala?

After joining two dataframes, I find that the column order has changed what I supposed it would be.

Ex: Joining two data frames with columns [b,c,d,e] and [a,b] on b yields a column order of [b,a,c,d,e].

How can I change the order of the columns (e.g., [a,b,c,d,e])? I've found ways to do it in Python/R but not Scala or Java. Are there any methods that allow swapping or reordering of dataframe columns?

like image 428
jest jest Avatar asked Dec 24 '22 04:12

jest jest


2 Answers

In Scala you can use the "splat" (:_*) syntax to pass a variable length list of columns to the DataFrame.select() method.

To address your example, you can get a list of the existing columns via DataFrame.columns, which returns an array of strings. Then just sort that array and convert the values to columns. You can then "splat" out to the select() method:

val mySortedCols = myDF.columns.sorted.map(str => col(str))
// Array[String]=(b,a,c,d,e) => Array[Column]=(a,b,c,d,e)

val myNewDF = myDF.select(mySortedCols:_*)
like image 172
chucknelson Avatar answered Dec 26 '22 20:12

chucknelson


One way of doing it is reordering after your join:

case class Person(name : String, age: Int)
val persons = Seq(Person("test", 10)).toDF

persons.show
+----+---+
|name|age|
+----+---+
|test| 10|
+----+---+

persons.select("age", "name").show

+---+----+
|age|name|
+---+----+
| 10|test|
+---+----+
like image 28
Kestemont Max Avatar answered Dec 26 '22 18:12

Kestemont Max