Drop a DataFrame's Column in SparkR

Question

I'm wondering if there is a concise method for dropping a DataFrame's column in SparkR, such as df.drop("column_name") in pyspark.

This is the closest I can get:

df <- new("DataFrame",
          sdf=SparkR:::callJMethod(df@sdf, "drop", "column_name"),
          isCached=FALSE)

zoltanctoth · Accepted Answer

This can be achieved by assigning NULL to the Spark dataframe column:

df$column_name <- NULL

See the original discussion at the related Spark JIRA ticket.

eliasah · Answer

Spark >= 2.0.0

You can use drop function:

drop(df, "column_name")

Spark < 2.0.0

You can use the select function to select what you need to keep giving it a set of columns with names or Column expressions.

Usage :

## S4 method for signature 'DataFrame'
x$name
## S4 replacement method for signature 'DataFrame'
x$name <- value
## S4 method for signature 'DataFrame,character'
select(x, col, ...)
## S4 method for signature 'DataFrame,Column'
select(x, col, ...)
## S4 method for signature 'DataFrame,list'
select(x, col)
select(x, col, ...)
selectExpr(x, expr, ...)

Examples :

select(df, "*")
select(df, "col1", "col2")
select(df, df$name, df$age + 1)
select(df, c("col1", "col2"))
select(df, list(df$name, df$age + 1))

# Similar to R data frames columns can also be selected using `$`
df$age

You may also be interested in the subset function that returns subsets of DataFrame according to given conditions.

I invited you to read the official documentation here for more information and examples.

Drop a DataFrame's Column in SparkR

Tags:

r

apache-spark

apache-spark-sql

sparkr

zoltanctoth

2 Answers

zoltanctoth

eliasah

Recent Activity

Donate For Us

Drop a DataFrame's Column in SparkR

Tags:

r

apache-spark

apache-spark-sql

sparkr

zoltanctoth

2 Answers

zoltanctoth

eliasah

Related questions

Recent Activity

Donate For Us