I'm wondering if there is a concise method for dropping a DataFrame's column in SparkR, such as df.drop("column_name") in pyspark.
This is the closest I can get:
df <- new("DataFrame",
sdf=SparkR:::callJMethod(df@sdf, "drop", "column_name"),
isCached=FALSE)
This can be achieved by assigning NULL to the Spark dataframe column:
df$column_name <- NULL
See the original discussion at the related Spark JIRA ticket.
Spark >= 2.0.0
You can use drop function:
drop(df, "column_name")
Spark < 2.0.0
You can use the select function to select what you need to keep giving it a set of columns with names or Column expressions.
Usage :
## S4 method for signature 'DataFrame'
x$name
## S4 replacement method for signature 'DataFrame'
x$name <- value
## S4 method for signature 'DataFrame,character'
select(x, col, ...)
## S4 method for signature 'DataFrame,Column'
select(x, col, ...)
## S4 method for signature 'DataFrame,list'
select(x, col)
select(x, col, ...)
selectExpr(x, expr, ...)
Examples :
select(df, "*")
select(df, "col1", "col2")
select(df, df$name, df$age + 1)
select(df, c("col1", "col2"))
select(df, list(df$name, df$age + 1))
# Similar to R data frames columns can also be selected using `$`
df$age
You may also be interested in the subset function that returns subsets of DataFrame according to given conditions.
I invited you to read the official documentation here for more information and examples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With