Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change a column position in a spark dataframe?

I was wondering if it is possible to change the position of a column in a dataframe, actually to change the schema?

Precisely if I have got a dataframe like [field1, field2, field3], and I would like to get [field1, field3, field2].

I can't put any piece of code. Let us imagine we're working with a dataframe with one hundred columns, after some joins and transformations, some of these columns are misplaced regarding the schema of the destination table.

How to move one or several columns, i.e: how to change the schema?

like image 968
obiwan kenobi Avatar asked Jun 29 '16 15:06

obiwan kenobi


People also ask

How do I change column position in Spark?

In order to Rearrange or reorder the column in pyspark we will be using select function. To reorder the column in ascending order we will be using Sorted function. To reorder the column in descending order we will be using Sorted function with an argument reverse =True. We also rearrange the column by position.

How do I order columns in Spark DataFrame?

In Spark, you can use either sort() or orderBy() function of DataFrame/Dataset to sort by ascending or descending order based on single or multiple columns, you can also do sorting using Spark SQL sorting functions, In this article, I will explain all these different ways using Scala examples.

How do I rearrange columns in pandas DataFrame?

You need to create a new list of your columns in the desired order, then use df = df[cols] to rearrange the columns in this new order.

How do you add a column in Pyspark DataFrame at a specific position?

In PySpark, to add a new column to DataFrame use lit() function by importing from pyspark. sql. functions import lit , lit() function takes a constant value you wanted to add and returns a Column type, if you wanted to add a NULL / None use lit(None) .


1 Answers

You can get the column names, reorder them however you want, and then use select on the original DataFrame to get a new one with this new order:

val columns: Array[String] = dataFrame.columns val reorderedColumnNames: Array[String] = ??? // do the reordering you want val result: DataFrame = dataFrame.select(reorderedColumnNames.head, reorderedColumnNames.tail: _*) 
like image 86
Tzach Zohar Avatar answered Sep 28 '22 11:09

Tzach Zohar