I have a data set, (call it DATA) with a variable, COLOR. The mode of COLOR is numeric and the class is factor. First, I'm a bit confused by the "numeric" -- when printed out, the data for COLOR are not numeric -- they are all character values, like White or Blue or Black, etc. Any clarification on this is appreciated.
Further, I need to Write R code to return the levels of the COLOR variable, then determine the current reference level of this variable, and finally set the reference level of this variable to White. I tried using factor, but was entirely unsuccessful.
Thank you for taking the time to help.
mode(DATA$COLOR)
is "numeric"
because R internally stores factors as numeric codes (to save space), plus an associated vector of labels corresponding to the code values. When you print the factor, R automatically substitutes the corresponding label for each code.
f <- factor(c("orange","banana","apple"))
## [1] orange banana apple
## Levels: apple banana orange
str(f)
## Factor w/ 3 levels "apple","banana",..: 3 2 1
c(f) ## strip attributes to get a numeric vector
## [1] 3 2 1
attributes(f)
## $levels
## [1] "apple" "banana" "orange"
## $class
## [1] "factor"
... I need to Write R code to return the levels of the COLOR variable ...
levels(DATA$COLOR)
... then determine the current reference level of this variable,
levels(DATA$COLOR)[1]
... and finally set the reference level of this variable to White.
DATA$COLOR <- relevel(DATA$COLOR,"White")
This is a consequence of how R stores factors. The values you see in the console look like characters but are stored internally as numbers (for reasons which are probably beyond the scope here).
If you want to recover the levels, you can type levels(DATA$COLOR)
. Take a look at ?factor
and ?levels
in the console to see more. If you want to re-level a factor then try and add a reproducible example so I can walk through the code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With