I have a data.frame with almost 200 variables (columns) and different type of data (num, int, logi, factor). Now, I would like to remove all the variables of the type "factor" to run the function cor()
When I use the function str() I can see which variables are of the type "factor", but I don't know how to select and remove all these variables, because removing one by one is time consuming. To select these variables I have tried attr(), and typeof() without results.
Some direction?
Assuming a generic data.frame
this will remove columns of type factor
df[,-which(sapply(df, class) == "factor")]
EDIT
As per @Roland's suggestion, you can also just keep those which are not factor
. Whichever you prefer.
df[, sapply(df, class) != "factor"]
EDIT 2
As you are concerned with the cor
function, @Ista also points out that it would be safer in that particular instance to filter on is.numeric
. The above are only to remove factor
types.
df[,sapply(df, is.numeric)]
Here's a very useful tidyverse
solution, adapted from here:
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
library(tidyverse)
# Create dummy dataset with multiple variable types
df <-
tibble::tribble(
~var_num_1, ~var_num_2, ~var_char, ~var_fct, ~var_date,
1, 10, "this", "THIS", "2019-12-18",
2, 20, "is", "IS", "2019-12-19",
3, 30, "dummy", "DUMMY", "2019-12-20",
4, 40, "character", "FACTOR", "2019-12-21",
5, 50, "text", "TEXT", "2019-12-22"
) %>%
mutate(
var_fct = as_factor(var_fct),
var_date = as_date(var_date)
)
# Select numeric variables
df %>% select_if(is.numeric)
#> # A tibble: 5 x 2
#> var_num_1 var_num_2
#> <dbl> <dbl>
#> 1 1 10
#> 2 2 20
#> 3 3 30
#> 4 4 40
#> 5 5 50
# Select character variables
df %>% select_if(is.character)
#> # A tibble: 5 x 1
#> var_char
#> <chr>
#> 1 this
#> 2 is
#> 3 dummy
#> 4 character
#> 5 text
# Select factor variables
df %>% select_if(is.factor)
#> # A tibble: 5 x 1
#> var_fct
#> <fct>
#> 1 THIS
#> 2 IS
#> 3 DUMMY
#> 4 FACTOR
#> 5 TEXT
# Select date variables
df %>% select_if(is.Date)
#> # A tibble: 5 x 1
#> var_date
#> <date>
#> 1 2019-12-18
#> 2 2019-12-19
#> 3 2019-12-20
#> 4 2019-12-21
#> 5 2019-12-22
# Select variables using negation (note the use of `~`)
df %>% select_if(~!is.numeric(.))
#> # A tibble: 5 x 3
#> var_char var_fct var_date
#> <chr> <fct> <date>
#> 1 this THIS 2019-12-18
#> 2 is IS 2019-12-19
#> 3 dummy DUMMY 2019-12-20
#> 4 character FACTOR 2019-12-21
#> 5 text TEXT 2019-12-22
Created on 2019-12-18 by the reprex package (v0.3.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With