Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find out whether a variable is a factor or continuous in R

Tags:

variables

r

I have a table with a bunch of variables. What statement I can use to find out whether these variables are considered as a factor or continuous?

like image 865
PMa Avatar asked Jan 06 '14 05:01

PMa


People also ask

How do you find out if a variable is a factor in R?

We can check if a variable is a factor or not using class() function. Similarly, levels of a factor can be checked using the levels() function.

How do you know if a variable is continuous or categorical in R?

In descriptive statistics for categorical variables in R, the value is limited and usually based on a particular finite group. For example, a categorical variable in R can be countries, year, gender, occupation. A continuous variable, however, can take any values, from integer to decimal.

How do you know if a variable is categorical or continuous?

There are three main types of variables: continuous variables can take any numerical value and are measured; discrete variables can only take certain numerical values and are counted; and categorical variables involve non-numeric groups or categories.

How do you know if a variable is continuous?

If you start counting now and never, ever, ever finish (i.e. the numbers go on and on until infinity), you have what's called a continuous variable. If your variable is “Number of Planets around a star,” then you can count all of the numbers out (there can't be an infinite number of planets).


2 Answers

Assuming foo is the name of your object and it is a data frame,

f <- sapply(foo, is.factor)

will apply the is.factor() function to each component (column) of the data frame. is.factor() checks if the supplied vector is a factor as far as R is concerned.

Then

which(f)

will tell you the index of the factor columns. f contains a logical vector too, so you could select the factor columns via

foo[, f]

or select all but them

foo[, !f]

Here is an example:

> ## some dummy data
> foo <- data.frame(a = factor(1:10), b = 1:10, c = factor(letters[1:10]))
> foo
    a  b c
1   1  1 a
2   2  2 b
3   3  3 c
4   4  4 d
5   5  5 e
6   6  6 f
7   7  7 g
8   8  8 h
9   9  9 i
10 10 10 j
> ## apply is.factor
> f <- sapply(foo, is.factor)
> f
   a     b     c 
TRUE FALSE  TRUE
> ## which are factors
> which(f)
a c 
1 3
> ## select those
> foo[, f]
    a c
1   1 a
2   2 b
3   3 c
4   4 d
5   5 e
6   6 f
7   7 g
8   8 h
9   9 i
10 10 j

There are equivalent checks for numeric and integer too, amongst others: is.numeric() and is.integer(), but you only need is.numeric() if you don't care about the type of numbers:

> is.numeric(1L)
[1] TRUE

(Also is.character(), is.logical(), ...)

like image 57
Gavin Simpson Avatar answered Oct 21 '22 05:10

Gavin Simpson


You have to use is.factor and is.numeric.

like image 45
Ricardo Oliveros-Ramos Avatar answered Oct 21 '22 06:10

Ricardo Oliveros-Ramos