Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove columns from a data.frame by data type?

Tags:

r

I have a data.frame with almost 200 variables (columns) and different type of data (num, int, logi, factor). Now, I would like to remove all the variables of the type "factor" to run the function cor()

When I use the function str() I can see which variables are of the type "factor", but I don't know how to select and remove all these variables, because removing one by one is time consuming. To select these variables I have tried attr(), and typeof() without results.

Some direction?

like image 295
Darwin PC Avatar asked Feb 16 '15 18:02

Darwin PC


2 Answers

Assuming a generic data.frame this will remove columns of type factor

df[,-which(sapply(df, class) == "factor")]

EDIT

As per @Roland's suggestion, you can also just keep those which are not factor. Whichever you prefer.

df[, sapply(df, class) != "factor"]

EDIT 2

As you are concerned with the cor function, @Ista also points out that it would be safer in that particular instance to filter on is.numeric. The above are only to remove factor types.

df[,sapply(df, is.numeric)]
like image 96
cdeterman Avatar answered Oct 05 '22 05:10

cdeterman


Here's a very useful tidyverse solution, adapted from here:

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(tidyverse)

# Create dummy dataset with multiple variable types
df <- 
  tibble::tribble(
  ~var_num_1, ~var_num_2,   ~var_char, ~var_fct, ~var_date,
           1,         10,      "this",   "THIS", "2019-12-18",
           2,         20,        "is",     "IS", "2019-12-19",
           3,         30,     "dummy",  "DUMMY", "2019-12-20",
           4,         40, "character", "FACTOR", "2019-12-21",
           5,         50,      "text",   "TEXT", "2019-12-22"
  ) %>% 
  mutate(
    var_fct = as_factor(var_fct),
    var_date = as_date(var_date)
  )


# Select numeric variables
df %>% select_if(is.numeric)
#> # A tibble: 5 x 2
#>   var_num_1 var_num_2
#>       <dbl>     <dbl>
#> 1         1        10
#> 2         2        20
#> 3         3        30
#> 4         4        40
#> 5         5        50

# Select character variables
df %>% select_if(is.character)
#> # A tibble: 5 x 1
#>   var_char 
#>   <chr>    
#> 1 this     
#> 2 is       
#> 3 dummy    
#> 4 character
#> 5 text

# Select factor variables
df %>% select_if(is.factor)
#> # A tibble: 5 x 1
#>   var_fct
#>   <fct>  
#> 1 THIS   
#> 2 IS     
#> 3 DUMMY  
#> 4 FACTOR 
#> 5 TEXT

# Select date variables
df %>% select_if(is.Date)
#> # A tibble: 5 x 1
#>   var_date  
#>   <date>    
#> 1 2019-12-18
#> 2 2019-12-19
#> 3 2019-12-20
#> 4 2019-12-21
#> 5 2019-12-22

# Select variables using negation (note the use of `~`)
df %>% select_if(~!is.numeric(.))
#> # A tibble: 5 x 3
#>   var_char  var_fct var_date  
#>   <chr>     <fct>   <date>    
#> 1 this      THIS    2019-12-18
#> 2 is        IS      2019-12-19
#> 3 dummy     DUMMY   2019-12-20
#> 4 character FACTOR  2019-12-21
#> 5 text      TEXT    2019-12-22

Created on 2019-12-18 by the reprex package (v0.3.0)

like image 45
Jonny Avatar answered Oct 05 '22 07:10

Jonny