When doing data science in the r programming language you will get the “Invalid factor level, NA generated” warning message when you try to add a value to categorical data that is not part of a defined level. The way to fix it is to change the data class to another class and then back to a factor.
We can check if a variable is a factor or not using class() function. Similarly, levels of a factor can be checked using the levels() function.
Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor function is used to create a factor. The only required argument to factor is a vector of values which will be returned as a vector of factor values.
To convert a single factor vector to a character vector we use the as. character() function of the R Language and pass the required factor vector as an argument.
The warning message is because your "Type" variable was made a factor and "lunch" was not a defined level. Use the stringsAsFactors = FALSE
flag when making your data frame to force "Type" to be a character.
> fixed <- data.frame("Type" = character(3), "Amount" = numeric(3))
> str(fixed)
'data.frame': 3 obs. of 2 variables:
$ Type : Factor w/ 1 level "": NA 1 1
$ Amount: chr "100" "0" "0"
>
> fixed <- data.frame("Type" = character(3), "Amount" = numeric(3),stringsAsFactors=FALSE)
> fixed[1, ] <- c("lunch", 100)
> str(fixed)
'data.frame': 3 obs. of 2 variables:
$ Type : chr "lunch" "" ""
$ Amount: chr "100" "0" "0"
If you are reading directly from CSV file then do like this.
myDataFrame <- read.csv("path/to/file.csv", header = TRUE, stringsAsFactors = FALSE)
Here is a flexible approach, it can be used in all cases, in particular:
dataframe
has been obtained from applying previous operations (e.g. not immediately opening a file, or creating a new data frame).First, un-factorize a string using the as.character
function, and, then, re-factorize with the as.factor
(or simply factor
) function:
fixed <- data.frame("Type" = character(3), "Amount" = numeric(3))
# Un-factorize (as.numeric can be use for numeric values)
# (as.vector can be use for objects - not tested)
fixed$Type <- as.character(fixed$Type)
fixed[1, ] <- c("lunch", 100)
# Re-factorize with the as.factor function or simple factor(fixed$Type)
fixed$Type <- as.factor(fixed$Type)
The easiest way to fix this is to add a new factor to your column. Use the levels function to determine how many factors you have and then add a new factor.
> levels(data$Fireplace.Qu)
[1] "Ex" "Fa" "Gd" "Po" "TA"
> levels(data$Fireplace.Qu) = c("Ex", "Fa", "Gd", "Po", "TA", "None")
[1] "Ex" "Fa" "Gd" "Po" " TA" "None"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With