Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset character columns from a data frame of characters and numbers

Tags:

dataframe

r

I have a data frame composed of numeric and non-numeric columns.

I would like to extract (subset) only the non-numeric columns, so the character ones. While I was able to subset the numeric columns using the string: sub_num = x[sapply(x, is.numeric)], I'm not able to do the opposite using the is.character form. Can anyone help me?

like image 833
Elb Avatar asked May 12 '12 14:05

Elb


4 Answers

If you are trying to select only character columns, this can be done with dplyr::select_if() and is.character(). Using the dplyr::starwars sample data as an example:

library(dplyr)
starwars %>% 
  select_if(is.character) %>% 
  head(2)
# A tibble: 2 x 7
  name           hair_color skin_color eye_color gender homeworld species
  <chr>          <chr>      <chr>      <chr>     <chr>  <chr>     <chr>  
1 Luke Skywalker blond      fair       blue      male   Tatooine  Human  
2 C-3PO          NA         gold       yellow    NA     Tatooine  Droid 

Or if you are trying to negate a certain column type, note that the syntax is slightly different:

starwars %>%  
  select_if(~!is.numeric(.)) %>% 
  head(2)

# A tibble: 2 x 10
    name           hair_color skin_color eye_color gender homeworld species films     vehicles  starships
    <chr>          <chr>      <chr>      <chr>     <chr>  <chr>     <chr>   <list>    <list>    <list>   
  1 Luke Skywalker blond      fair       blue      male   Tatooine  Human   <chr [5]> <chr [2]> <chr [2]>
  2 C-3PO          NA         gold       yellow    NA     Tatooine  Droid   <chr [6]> <chr [0]> <chr [0]>
like image 188
sbha Avatar answered Nov 09 '22 21:11

sbha


Try:

x[sapply(x, function(x) !is.numeric(x))]

As it will pull anything not numeric so factors and character.

EDIT:

x <- data.frame(a=runif(10), b=1:10, c=letters[1:10], 
    d=as.factor(rep(c("A", "B"), each=5)), 
    e=as.Date(seq(as.Date("2000/1/1"), by="month", length.out=10)),
    stringsAsFactors = FALSE)

# > str(x)
# 'data.frame':   10 obs. of  5 variables:
#  $ a: num  0.814 0.372 0.732 0.522 0.626 ...
#  $ b: int  1 2 3 4 5 6 7 8 9 10
#  $ c: chr  "a" "b" "c" "d" ...
#  $ d: Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
#  $ e: Date, format: "2000-01-01" "2000-02-01" ...

x[sapply(x, function(x) !is.numeric(x))]
like image 35
Tyler Rinker Avatar answered Nov 09 '22 19:11

Tyler Rinker


Ok, I did a short try about my idea.

I could confirm that the following code snippet is working:

str(d)
 'data.frame':  5 obs. of  3 variables:
  $ a: int  1 2 3 4 5
  $ b: chr  "a" "a" "a" "a" ...
  $ c: Factor w/ 1 level "b": 1 1 1 1 1


# Get all character columns
d[, sapply(d, class) == 'character']

# Or, for factors, which might be likely:
d[, sapply(d, class) == 'factor']

# If you want to get both factors and characters use
d[, sapply(d, class) %in% c('character', 'factor')]

Using the correct class, your sapply-approach should work as well, at least as long as you insert the missing , before the sapply function.

The approach using !is.numeric does not scale very well if you have classes that do not belong in the group numeric, factor, character (one I use very often is POSIXct, for example)

like image 10
Thilo Avatar answered Nov 09 '22 21:11

Thilo


As per most recent dplyr updates:

starwars %>% 
  select(where(is.character))

You can switch is.character to is.numeric/ is.factor and so on.

Another way would be to use keep or discard functions from purrr package:

starwars %>% 
  purrr::keep(~is.character(.)) 

starwars %>% 
  purrr::discard(~!is.character(.))
like image 2
AlexB Avatar answered Nov 09 '22 20:11

AlexB