I was wondering if it is possible to change the position of a column in a dataframe, actually to change the schema?
Precisely if I have got a dataframe like [field1, field2, field3]
, and I would like to get [field1, field3, field2]
.
I can't put any piece of code. Let us imagine we're working with a dataframe with one hundred columns, after some joins and transformations, some of these columns are misplaced regarding the schema of the destination table.
How to move one or several columns, i.e: how to change the schema?
In order to Rearrange or reorder the column in pyspark we will be using select function. To reorder the column in ascending order we will be using Sorted function. To reorder the column in descending order we will be using Sorted function with an argument reverse =True. We also rearrange the column by position.
In Spark, you can use either sort() or orderBy() function of DataFrame/Dataset to sort by ascending or descending order based on single or multiple columns, you can also do sorting using Spark SQL sorting functions, In this article, I will explain all these different ways using Scala examples.
You need to create a new list of your columns in the desired order, then use df = df[cols] to rearrange the columns in this new order.
In PySpark, to add a new column to DataFrame use lit() function by importing from pyspark. sql. functions import lit , lit() function takes a constant value you wanted to add and returns a Column type, if you wanted to add a NULL / None use lit(None) .
You can get the column names, reorder them however you want, and then use select
on the original DataFrame to get a new one with this new order:
val columns: Array[String] = dataFrame.columns val reorderedColumnNames: Array[String] = ??? // do the reordering you want val result: DataFrame = dataFrame.select(reorderedColumnNames.head, reorderedColumnNames.tail: _*)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With