I have a data frame named "insurance" with both numerical and factor variables. How can I select all factor variables so that I can check the levels of the categorical variables? I tried <code>sapply(insurance,class)</code> to get the the classes of all variables. But then I can't make logical argument based on if <code>class(var)="factor"</code> as the variable names are also included in the result of <code>sapply()</code>. Thanks,

Some data: <pre class="prettyprint"><code>insurance <- data.frame( int = 1:5, fact1 = letters[1:5], fact2 = factor(1:5), fact3 = LETTERS[3:7] ) </code></pre> I would use <code>sapply</code> like you did, but combined with <code>is.factor</code> to return a logical vector: <pre class="prettyprint"><code>is.fact <- sapply(insurance, is.factor) # int fact1 fact2 fact3 # FALSE TRUE TRUE TRUE </code></pre> Then use <code>[</code> to extract these columns: <pre class="prettyprint"><code>factors.df <- insurance[, is.fact] # fact1 fact2 fact3 # 1 a 1 C # 2 b 2 D # 3 c 3 E # 4 d 4 F # 5 e 5 G </code></pre> Finally, to get the levels, use <code>lapply</code>: <pre class="prettyprint"><code>lapply(factors.df, levels) # $fact1 # [1] "a" "b" "c" "d" "e" # # $fact2 # [1] "1" "2" "3" "4" "5" # # $fact3 # [1] "C" "D" "E" "F" "G" </code></pre> You might also find <code>str(insurance)</code> interesting as a short summary.

This (almost) appears the perfect time to use the seldom-used function rapply <pre class="prettyprint"><code>rapply(insurance, class = "factor", f = levels, how = "list") </code></pre> Or <pre class="prettyprint"><code>Filter(Negate(is.null),rapply(insurance, class = "factor", f = levels, how = "list")) </code></pre> To remove the <code>NULL</code> elements (that weren't factors) Or simply <pre class="prettyprint"><code>lapply(Filter(is.factor,insurance), levels)) </code></pre>

How to select all factor variables in R

Tags:

r

I have a data frame named "insurance" with both numerical and factor variables. How can I select all factor variables so that I can check the levels of the categorical variables?

I tried sapply(insurance,class) to get the the classes of all variables. But then I can't make logical argument based on if class(var)="factor" as the variable names are also included in the result of sapply().

Thanks,

739

asked Jul 28 '13 11:07

Gold Waterson

2 Answers

Some data:

insurance <- data.frame(
  int   = 1:5,
  fact1 = letters[1:5],
  fact2 = factor(1:5),
  fact3 = LETTERS[3:7]
)

I would use sapply like you did, but combined with is.factor to return a logical vector:

is.fact <- sapply(insurance, is.factor)
#   int fact1 fact2 fact3 
# FALSE  TRUE  TRUE  TRUE

Then use [ to extract these columns:

factors.df <- insurance[, is.fact]
#   fact1 fact2 fact3
# 1     a     1     C
# 2     b     2     D
# 3     c     3     E
# 4     d     4     F
# 5     e     5     G

Finally, to get the levels, use lapply:

lapply(factors.df, levels)
# $fact1
# [1] "a" "b" "c" "d" "e"
# 
# $fact2
# [1] "1" "2" "3" "4" "5"
# 
# $fact3
# [1] "C" "D" "E" "F" "G"

You might also find str(insurance) interesting as a short summary.

144

answered Oct 05 '22 00:10

flodel

This (almost) appears the perfect time to use the seldom-used function rapply

rapply(insurance, class = "factor", f = levels, how = "list")

Filter(Negate(is.null),rapply(insurance, class = "factor", f = levels, how = "list"))

To remove the NULL elements (that weren't factors)

Or simply

lapply(Filter(is.factor,insurance), levels))

answered Oct 05 '22 00:10

mnel

Related questions
                            
                                How to change the position of the table of contents in rmarkdown?
                            
                                python equivalent of get() in R (= use string to retrieve value of symbol)
                            
                                Dynamic plot height in Shiny
                            
                                Alternative to R's `memory.size()` in linux?
                            
                                Complexe non-equi merge in R
                            
                                ggplot 'non-finite values' error
                            
                                Running R scripts in Airflow?
                            
                                Efficiently merging large data.tables [duplicate]
                            
                                R cannot read Python Pandas dataframe saved in feather format
                            
                                How to vectorize R strsplit?
                            
                                How can I get the screen resolution in R
                            
                                modify variable within R function
                            
                                reshape wide to long with character suffixes instead of numeric suffixes
                            
                                Numbers in Geometric Progression
                            
                                Rcpp pass by reference vs. by value
                            
                                The space above and below the legend using ggplot2
                            
                                limiting memory usage in R under linux
                            
                                round a date in R to an arbitrary minute/hour level of precision
                            
                                how do you connect to a remote server with ssh in R
                            
                                Converting from a list to numeric in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With