Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add extra level to factors in dataframe

I have a data frame with numeric and ordered factor columns. I have lot of NA values, so no level is assigned to them. I changed NA to "No Answer", but levels of the factor columns don't contain that level, so here is how I started, but I don't know how to finish it in an elegant way:

addNoAnswer = function(df) {    factorOrNot = sapply(df, is.factor)    levelsList = lapply(df[, factorOrNot], levels)    levelsList = lapply(levelsList, function(x) c(x, "No Answer"))    ... 

Is there a way to directly apply new levels to factor columns, for example, something like this:

df[, factorOrNot] = lapply(df[, factorOrNot], factor, levelsList) 

Of course, this doesn't work correctly.

I want the order of levels preserved and "No Answer" level added to last place.

like image 983
enedene Avatar asked Apr 26 '14 21:04

enedene


People also ask

How do you assign a level to a factor in R?

How do I Rename Factor Levels in R? The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels “A”, “B”, and “C” you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3") .

How many levels of a factor are there?

A factor must have at least two levels.

What does factor () do in R?

Factor in R is a variable used to categorize and store the data, having a limited number of different values. It stores the data as a vector of integer values. Factor in R is also known as a categorical variable that stores both string and integer data values as levels.

What type is internally used to store elements of a factor?

Internal Storage and Extra Levels Factor variables are stored, internally, as numeric variables together with their levels. The actual values of the numeric variable are 1, 2, and so on.


2 Answers

The levels function accept the levels(x) <- value call. Therefore, it's very easy to add different levels:

f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b")) str(f1)  Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ... levels(f1) <- c(levels(f1),"No Answer") f1[is.na(f1)] <- "No Answer" str(f1)  Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ... 

You can then loop it around all variables in a data.frame:

f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b")) f2 <- factor(c("c", NA, "b", NA, "b", NA, "c" ,"a", "d", "a", "b")) f3 <- factor(c(NA, "b", NA, "b", NA, NA, "c", NA, "d" , "e", "a")) df1 <- data.frame(f1,n1=1:11,f2,f3)  str(df1)   'data.frame':   11 obs. of  4 variables:   $ f1: Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...   $ n1: int  1 2 3 4 5 6 7 8 9 10 ...   $ f2: Factor w/ 4 levels "a","b","c","d": 3 NA 2 NA 2 NA 3 1 4 1 ...   $ f3: Factor w/ 5 levels "a","b","c","d",..: NA 2 NA 2 NA NA 3 NA 4 5 ...      for(i in 1:ncol(df1)) if(is.factor(df1[,i])) levels(df1[,i]) <- c(levels(df1[,i]),"No Answer") df1[is.na(df1)] <- "No Answer"  str(df1)  'data.frame':   11 obs. of  4 variables:   $ f1: Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...   $ n1: int  1 2 3 4 5 6 7 8 9 10 ...   $ f2: Factor w/ 5 levels "a","b","c","d",..: 3 5 2 5 2 5 3 1 4 1 ...   $ f3: Factor w/ 6 levels "a","b","c","d",..: 6 2 6 2 6 6 3 6 4 5 ... 
like image 170
Bastien Avatar answered Oct 01 '22 07:10

Bastien


You could define a function that adds the levels to a factor, but just returns anything else:

addNoAnswer <- function(x){   if(is.factor(x)) return(factor(x, levels=c(levels(x), "No Answer")))   return(x) } 

Then you just lapply this function to your columns

df <- as.data.frame(lapply(df, addNoAnswer)) 

That should return what you want.

like image 33
ilir Avatar answered Oct 01 '22 09:10

ilir