Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop a DataFrame's Column in SparkR

I'm wondering if there is a concise method for dropping a DataFrame's column in SparkR, such as df.drop("column_name") in pyspark.

This is the closest I can get:

df <- new("DataFrame",
          sdf=SparkR:::callJMethod(df@sdf, "drop", "column_name"),
          isCached=FALSE)
like image 780
zoltanctoth Avatar asked Jun 10 '26 17:06

zoltanctoth


2 Answers

This can be achieved by assigning NULL to the Spark dataframe column:

df$column_name <- NULL

See the original discussion at the related Spark JIRA ticket.

like image 67
zoltanctoth Avatar answered Jun 12 '26 09:06

zoltanctoth


Spark >= 2.0.0

You can use drop function:

drop(df, "column_name")

Spark < 2.0.0

You can use the select function to select what you need to keep giving it a set of columns with names or Column expressions.

Usage :

## S4 method for signature 'DataFrame'
x$name
## S4 replacement method for signature 'DataFrame'
x$name <- value
## S4 method for signature 'DataFrame,character'
select(x, col, ...)
## S4 method for signature 'DataFrame,Column'
select(x, col, ...)
## S4 method for signature 'DataFrame,list'
select(x, col)
select(x, col, ...)
selectExpr(x, expr, ...)

Examples :

select(df, "*")
select(df, "col1", "col2")
select(df, df$name, df$age + 1)
select(df, c("col1", "col2"))
select(df, list(df$name, df$age + 1))

# Similar to R data frames columns can also be selected using `$`
df$age

You may also be interested in the subset function that returns subsets of DataFrame according to given conditions.

I invited you to read the official documentation here for more information and examples.

like image 23
eliasah Avatar answered Jun 12 '26 07:06

eliasah



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!