I have an R data frame and some of the variables are categorical. For example sex is "male" or "female" and "do you smoke" is 0 or 1. Others variables instead are continuous. I would like to know if there is any way to decide if a variable is categorical or not and in case compute its frequencies.
I think in my case a good test would be to check if the variable takes less than k=4 values.
In descriptive statistics for categorical variables in R, the value is limited and usually based on a particular finite group. For example, a categorical variable in R can be countries, year, gender, occupation. A continuous variable, however, can take any values, from integer to decimal.
Answer. A categorical variable is a variable with a set number of groups (gender, colors of the rainbow, brands of cereal), while a numeric variable is generally something that can be measured (height, weight, miles per hour).
To check the data type of a variable in R, use the typeof() function. The typeof() is a built-in R function that defines the (internal) type or storage mode of any R object.
If the data can only be grouped into categories, then it is considered a categorical variable. If, however, if you can perform arithmetic operations then it is considered a numerical or quantitative variable. For example, a random group of people could be surveyed: To determine their grade point average.
While you should use factors for categorical variables, you can find the unique values in a vector x
with unique
, and count them:
length(unique(x))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With