I have a data frame with numeric and ordered factor columns. I have lot of NA values, so no level is assigned to them. I changed NA to "No Answer", but levels of the factor columns don't contain that level, so here is how I started, but I don't know how to finish it in an elegant way:
addNoAnswer = function(df) { factorOrNot = sapply(df, is.factor) levelsList = lapply(df[, factorOrNot], levels) levelsList = lapply(levelsList, function(x) c(x, "No Answer")) ...
Is there a way to directly apply new levels to factor columns, for example, something like this:
df[, factorOrNot] = lapply(df[, factorOrNot], factor, levelsList)
Of course, this doesn't work correctly.
I want the order of levels preserved and "No Answer" level added to last place.
How do I Rename Factor Levels in R? The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels “A”, “B”, and “C” you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3") .
A factor must have at least two levels.
Factor in R is a variable used to categorize and store the data, having a limited number of different values. It stores the data as a vector of integer values. Factor in R is also known as a categorical variable that stores both string and integer data values as levels.
Internal Storage and Extra Levels Factor variables are stored, internally, as numeric variables together with their levels. The actual values of the numeric variable are 1, 2, and so on.
The levels
function accept the levels(x) <- value
call. Therefore, it's very easy to add different levels:
f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b")) str(f1) Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ... levels(f1) <- c(levels(f1),"No Answer") f1[is.na(f1)] <- "No Answer" str(f1) Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...
You can then loop it around all variables in a data.frame:
f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b")) f2 <- factor(c("c", NA, "b", NA, "b", NA, "c" ,"a", "d", "a", "b")) f3 <- factor(c(NA, "b", NA, "b", NA, NA, "c", NA, "d" , "e", "a")) df1 <- data.frame(f1,n1=1:11,f2,f3) str(df1) 'data.frame': 11 obs. of 4 variables: $ f1: Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ... $ n1: int 1 2 3 4 5 6 7 8 9 10 ... $ f2: Factor w/ 4 levels "a","b","c","d": 3 NA 2 NA 2 NA 3 1 4 1 ... $ f3: Factor w/ 5 levels "a","b","c","d",..: NA 2 NA 2 NA NA 3 NA 4 5 ... for(i in 1:ncol(df1)) if(is.factor(df1[,i])) levels(df1[,i]) <- c(levels(df1[,i]),"No Answer") df1[is.na(df1)] <- "No Answer" str(df1) 'data.frame': 11 obs. of 4 variables: $ f1: Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ... $ n1: int 1 2 3 4 5 6 7 8 9 10 ... $ f2: Factor w/ 5 levels "a","b","c","d",..: 3 5 2 5 2 5 3 1 4 1 ... $ f3: Factor w/ 6 levels "a","b","c","d",..: 6 2 6 2 6 6 3 6 4 5 ...
You could define a function that adds the levels to a factor, but just returns anything else:
addNoAnswer <- function(x){ if(is.factor(x)) return(factor(x, levels=c(levels(x), "No Answer"))) return(x) }
Then you just lapply
this function to your columns
df <- as.data.frame(lapply(df, addNoAnswer))
That should return what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With