Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove a column from a dataframe spark

I have a Spark dataframe with a very large number of columns. I want to remove two columns from it to get a new dataframe.

Had there been fewer columns, I could have used the select method in the API like this:

pcomments = pcomments.select(pcomments.col("post_id"),pcomments.col("comment_id"),pcomments.col("comment_message"),pcomments.col("user_name"),pcomments.col("comment_createdtime")); 

But since picking columns from a long list is a tedious task, is there a workaround?

like image 903
Count Avatar asked Jan 20 '17 12:01

Count


People also ask

How do I remove a column from a DataFrame in spark?

Spark DataFrame provides a drop() method to drop a column/field from a DataFrame/Dataset. drop() method also used to remove multiple columns at a time from a Spark DataFrame/Dataset.

How do you remove columns in PySpark?

In pyspark the drop() function can be used to remove values/columns from the dataframe. thresh – This takes an integer value and drops rows that have less than that thresh hold non-null values.

How do you delete a column in Scala?

We can use drop function to remove or delete columns from a DataFrame.

How do I drop multiple columns in spark DataFrame?

The Spark DataFrame provides the drop() method to drop the column or the field from the DataFrame or the Dataset. The drop() method is also used to remove the multiple columns from the Spark DataFrame or the Database.


2 Answers

Use drop method and withColumnRenamed methods.

Example:

    val initialDf= ....      val dfAfterDrop=initialDf.drop("column1").drop("coumn2")      val dfAfterColRename= dfAfterDrop.withColumnRenamed("oldColumnName","new ColumnName") 
like image 50
SanthoshPrasad Avatar answered Sep 18 '22 09:09

SanthoshPrasad


Try this:

val initialDf = ...  val dfAfterDropCols = initialDf.drop("column1", "coumn2") 
like image 38
Manoj Kumar Dhakad Avatar answered Sep 20 '22 09:09

Manoj Kumar Dhakad