Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dropping multiple columns from Spark dataframe by Iterating through the columns from a Scala List of Column names

I have a dataframe which has columns around 400, I want to drop 100 columns as per my requirement. So i have created a Scala List of 100 column names. And then i want to iterate through a for loop to actually drop the column in each for loop iteration.

Below is the code.

final val dropList: List[String] = List("Col1","Col2",...."Col100”)

def drpColsfunc(inputDF: DataFrame): DataFrame = { 
    for (i <- 0 to dropList.length - 1) {
        val returnDF = inputDF.drop(dropList(i))
    }
    return returnDF
}

val test_df = drpColsfunc(input_dataframe) 

test_df.show(5)
like image 892
Ramesh Avatar asked Sep 30 '16 08:09

Ramesh


People also ask

How do I drop multiple columns in spark DataFrame?

The Spark DataFrame provides the drop() method to drop the column or the field from the DataFrame or the Dataset. The drop() method is also used to remove the multiple columns from the Spark DataFrame or the Database.

How do I select multiple columns in spark Scala DataFrame?

You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.


2 Answers

If you just want to do nothing more complex than dropping several named columns, as opposed to selecting them by a particular condition, you can simply do the following:

df.drop("colA", "colB", "colC")
like image 100
Ricky McMaster Avatar answered Sep 28 '22 04:09

Ricky McMaster


Answer:

val colsToRemove = Seq("colA", "colB", "colC", etc) 

val filteredDF = df.select(df.columns .filter(colName => !colsToRemove.contains(colName)) .map(colName => new Column(colName)): _*) 
like image 20
Ramesh Avatar answered Sep 28 '22 02:09

Ramesh