I have a table with a bunch of variables. What statement I can use to find out whether these variables are considered as a factor or continuous?
We can check if a variable is a factor or not using class() function. Similarly, levels of a factor can be checked using the levels() function.
In descriptive statistics for categorical variables in R, the value is limited and usually based on a particular finite group. For example, a categorical variable in R can be countries, year, gender, occupation. A continuous variable, however, can take any values, from integer to decimal.
There are three main types of variables: continuous variables can take any numerical value and are measured; discrete variables can only take certain numerical values and are counted; and categorical variables involve non-numeric groups or categories.
If you start counting now and never, ever, ever finish (i.e. the numbers go on and on until infinity), you have what's called a continuous variable. If your variable is “Number of Planets around a star,” then you can count all of the numbers out (there can't be an infinite number of planets).
Assuming foo
is the name of your object and it is a data frame,
f <- sapply(foo, is.factor)
will apply the is.factor()
function to each component (column) of the data frame. is.factor()
checks if the supplied vector is a factor as far as R is concerned.
Then
which(f)
will tell you the index of the factor columns. f
contains a logical vector too, so you could select the factor columns via
foo[, f]
or select all but them
foo[, !f]
Here is an example:
> ## some dummy data
> foo <- data.frame(a = factor(1:10), b = 1:10, c = factor(letters[1:10]))
> foo
a b c
1 1 1 a
2 2 2 b
3 3 3 c
4 4 4 d
5 5 5 e
6 6 6 f
7 7 7 g
8 8 8 h
9 9 9 i
10 10 10 j
> ## apply is.factor
> f <- sapply(foo, is.factor)
> f
a b c
TRUE FALSE TRUE
> ## which are factors
> which(f)
a c
1 3
> ## select those
> foo[, f]
a c
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
There are equivalent checks for numeric and integer too, amongst others: is.numeric()
and is.integer()
, but you only need is.numeric()
if you don't care about the type of numbers:
> is.numeric(1L)
[1] TRUE
(Also is.character()
, is.logical()
, ...)
You have to use is.factor
and is.numeric
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With