I have a data frame composed of numeric and non-numeric columns.
I would like to extract (subset) only the non-numeric columns, so the character ones. While I was able to subset the numeric columns using the string: sub_num = x[sapply(x, is.numeric)]
, I'm not able to do the opposite using the is.character
form. Can anyone help me?
If you are trying to select only character columns, this can be done with dplyr::select_if()
and is.character()
. Using the dplyr::starwars
sample data as an example:
library(dplyr)
starwars %>%
select_if(is.character) %>%
head(2)
# A tibble: 2 x 7
name hair_color skin_color eye_color gender homeworld species
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Luke Skywalker blond fair blue male Tatooine Human
2 C-3PO NA gold yellow NA Tatooine Droid
Or if you are trying to negate a certain column type, note that the syntax is slightly different:
starwars %>%
select_if(~!is.numeric(.)) %>%
head(2)
# A tibble: 2 x 10
name hair_color skin_color eye_color gender homeworld species films vehicles starships
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <list> <list> <list>
1 Luke Skywalker blond fair blue male Tatooine Human <chr [5]> <chr [2]> <chr [2]>
2 C-3PO NA gold yellow NA Tatooine Droid <chr [6]> <chr [0]> <chr [0]>
Try:
x[sapply(x, function(x) !is.numeric(x))]
As it will pull anything not numeric so factors and character.
EDIT:
x <- data.frame(a=runif(10), b=1:10, c=letters[1:10],
d=as.factor(rep(c("A", "B"), each=5)),
e=as.Date(seq(as.Date("2000/1/1"), by="month", length.out=10)),
stringsAsFactors = FALSE)
# > str(x)
# 'data.frame': 10 obs. of 5 variables:
# $ a: num 0.814 0.372 0.732 0.522 0.626 ...
# $ b: int 1 2 3 4 5 6 7 8 9 10
# $ c: chr "a" "b" "c" "d" ...
# $ d: Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
# $ e: Date, format: "2000-01-01" "2000-02-01" ...
x[sapply(x, function(x) !is.numeric(x))]
Ok, I did a short try about my idea.
I could confirm that the following code snippet is working:
str(d)
'data.frame': 5 obs. of 3 variables:
$ a: int 1 2 3 4 5
$ b: chr "a" "a" "a" "a" ...
$ c: Factor w/ 1 level "b": 1 1 1 1 1
# Get all character columns
d[, sapply(d, class) == 'character']
# Or, for factors, which might be likely:
d[, sapply(d, class) == 'factor']
# If you want to get both factors and characters use
d[, sapply(d, class) %in% c('character', 'factor')]
Using the correct class, your sapply
-approach should work as well, at least as long as you insert the missing ,
before the sapply
function.
The approach using !is.numeric
does not scale very well if you have classes that do not belong in the group numeric, factor, character
(one I use very often is POSIXct
, for example)
As per most recent dplyr
updates:
starwars %>%
select(where(is.character))
You can switch is.character
to is.numeric
/ is.factor
and so on.
Another way would be to use keep
or discard
functions from purrr
package:
starwars %>%
purrr::keep(~is.character(.))
starwars %>%
purrr::discard(~!is.character(.))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With