I have a data frame named "insurance" with both numerical and factor variables. How can I select all factor variables so that I can check the levels of the categorical variables?
I tried sapply(insurance,class)
to get the the classes of all variables. But then I can't make logical argument based on if class(var)="factor"
as the variable names are also included in the result of sapply()
.
Thanks,
First select the factor columns and then use purrr::map to show the factor levels for each column. Show activity on this post. I wasn't the first downvote, but the reason for adding negativity is to discourage you from using apply when it is not appropriate.
You can shift-click to select a range of variables, you can hold shift and press the down key to select one or more variables, and so on. And then you can press Paste and the command with extracted variable names is pasted into your script editor.
You can use ls() to list all variables that are created in the environment. Use ls() to display all variables.
Factor in R is a variable used to categorize and store the data, having a limited number of different values. It stores the data as a vector of integer values. Factor in R is also known as a categorical variable that stores both string and integer data values as levels.
Some data:
insurance <- data.frame(
int = 1:5,
fact1 = letters[1:5],
fact2 = factor(1:5),
fact3 = LETTERS[3:7]
)
I would use sapply
like you did, but combined with is.factor
to return a logical vector:
is.fact <- sapply(insurance, is.factor)
# int fact1 fact2 fact3
# FALSE TRUE TRUE TRUE
Then use [
to extract these columns:
factors.df <- insurance[, is.fact]
# fact1 fact2 fact3
# 1 a 1 C
# 2 b 2 D
# 3 c 3 E
# 4 d 4 F
# 5 e 5 G
Finally, to get the levels, use lapply
:
lapply(factors.df, levels)
# $fact1
# [1] "a" "b" "c" "d" "e"
#
# $fact2
# [1] "1" "2" "3" "4" "5"
#
# $fact3
# [1] "C" "D" "E" "F" "G"
You might also find str(insurance)
interesting as a short summary.
This (almost) appears the perfect time to use the seldom-used function rapply
rapply(insurance, class = "factor", f = levels, how = "list")
Or
Filter(Negate(is.null),rapply(insurance, class = "factor", f = levels, how = "list"))
To remove the NULL
elements (that weren't factors)
Or simply
lapply(Filter(is.factor,insurance), levels))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With