After joining two dataframes, I find that the column order has changed what I supposed it would be.
Ex: Joining two data frames with columns [b,c,d,e]
and [a,b]
on b
yields a column order of [b,a,c,d,e]
.
How can I change the order of the columns (e.g., [a,b,c,d,e]
)?
I've found ways to do it in Python/R but not Scala or Java. Are there any methods that allow swapping or reordering of dataframe columns?
In Scala you can use the "splat" (:_*
) syntax to pass a variable length list of columns to the DataFrame.select()
method.
To address your example, you can get a list of the existing columns via DataFrame.columns
, which returns an array of strings. Then just sort that array and convert the values to columns. You can then "splat" out to the select()
method:
val mySortedCols = myDF.columns.sorted.map(str => col(str))
// Array[String]=(b,a,c,d,e) => Array[Column]=(a,b,c,d,e)
val myNewDF = myDF.select(mySortedCols:_*)
One way of doing it is reordering after your join:
case class Person(name : String, age: Int)
val persons = Seq(Person("test", 10)).toDF
persons.show
+----+---+
|name|age|
+----+---+
|test| 10|
+----+---+
persons.select("age", "name").show
+---+----+
|age|name|
+---+----+
| 10|test|
+---+----+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With