Add extra level to factors in dataframe

Q: How many levels of a factor are there?

A factor must have at least two levels.

Q: What does factor () do in R?

Factor in R is a variable used to categorize and store the data, having a limited number of different values. It stores the data as a vector of integer values. Factor in R is also known as a categorical variable that stores both string and integer data values as levels.

Q: What type is internally used to store elements of a factor?

Internal Storage and Extra Levels Factor variables are stored, internally, as numeric variables together with their levels. The actual values of the numeric variable are 1, 2, and so on.

Tags:

dataframe

r

categorical-data

I have a data frame with numeric and ordered factor columns. I have lot of NA values, so no level is assigned to them. I changed NA to "No Answer", but levels of the factor columns don't contain that level, so here is how I started, but I don't know how to finish it in an elegant way:

addNoAnswer = function(df) {    factorOrNot = sapply(df, is.factor)    levelsList = lapply(df[, factorOrNot], levels)    levelsList = lapply(levelsList, function(x) c(x, "No Answer"))    ...

Is there a way to directly apply new levels to factor columns, for example, something like this:

df[, factorOrNot] = lapply(df[, factorOrNot], factor, levelsList)

Of course, this doesn't work correctly.

I want the order of levels preserved and "No Answer" level added to last place.

983

asked Apr 26 '14 21:04

enedene

2 Answers

The levels function accept the levels(x) <- value call. Therefore, it's very easy to add different levels:

f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b")) str(f1)  Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ... levels(f1) <- c(levels(f1),"No Answer") f1[is.na(f1)] <- "No Answer" str(f1)  Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...

You can then loop it around all variables in a data.frame:

f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b")) f2 <- factor(c("c", NA, "b", NA, "b", NA, "c" ,"a", "d", "a", "b")) f3 <- factor(c(NA, "b", NA, "b", NA, NA, "c", NA, "d" , "e", "a")) df1 <- data.frame(f1,n1=1:11,f2,f3)  str(df1)   'data.frame':   11 obs. of  4 variables:   $ f1: Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...   $ n1: int  1 2 3 4 5 6 7 8 9 10 ...   $ f2: Factor w/ 4 levels "a","b","c","d": 3 NA 2 NA 2 NA 3 1 4 1 ...   $ f3: Factor w/ 5 levels "a","b","c","d",..: NA 2 NA 2 NA NA 3 NA 4 5 ...      for(i in 1:ncol(df1)) if(is.factor(df1[,i])) levels(df1[,i]) <- c(levels(df1[,i]),"No Answer") df1[is.na(df1)] <- "No Answer"  str(df1)  'data.frame':   11 obs. of  4 variables:   $ f1: Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...   $ n1: int  1 2 3 4 5 6 7 8 9 10 ...   $ f2: Factor w/ 5 levels "a","b","c","d",..: 3 5 2 5 2 5 3 1 4 1 ...   $ f3: Factor w/ 6 levels "a","b","c","d",..: 6 2 6 2 6 6 3 6 4 5 ...

170

answered Oct 01 '22 07:10

Bastien

You could define a function that adds the levels to a factor, but just returns anything else:

addNoAnswer <- function(x){   if(is.factor(x)) return(factor(x, levels=c(levels(x), "No Answer")))   return(x) }

Then you just lapply this function to your columns

df <- as.data.frame(lapply(df, addNoAnswer))

That should return what you want.

answered Oct 01 '22 09:10

ilir

Related questions
                            
                                Animated sorted bar chart with bars overtaking each other
                            
                                Grid line consistent with ticks on axis
                            
                                ggplot2 heatmap with colors for ranged values
                            
                                Calculate percentage change in an R data frame
                            
                                Arrange a grouped_df by group variable not working
                            
                                Internal links in rmarkdown don't work
                            
                                Place a legend for each facet_wrap grid in ggplot2
                            
                                Batch convert columns to numeric type
                            
                                Sum of two Columns of Data Frame with NA Values
                            
                                Spearman correlation and ties
                            
                                How to extract sheet names from Excel file in R
                            
                                readOGR() cannot open file
                            
                                Emulate split() with dplyr group_by: return a list of data frames
                            
                                How to save() with a particular variable name
                            
                                Error: '\R' is an unrecognized escape in character string starting "C:\R"
                            
                                sample rows of subgroups from dataframe with dplyr
                            
                                How to add code folding to output chunks in rmarkdown html documents
                            
                                Calling a function from a namespace
                            
                                How to increase the font size of ggtitle in ggplot2
                            
                                How to filter a data frame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With