Unseen factor levels when appending new records with unseen string values to a dataframe, cause Warning and result in NA

Q: What is a factor variable used for?

Factors are the variables that experimenters control during an experiment in order to determine their effect on the response variable. A factor can take on only a small number of values, which are known as factor levels.

Q: What is the factor variable in r?

What is Factor in R? Factor in R is a variable used to categorize and store the data, having a limited number of different values. It stores the data as a vector of integer values. Factor in R is also known as a categorical variable that stores both string and integer data values as levels.

Q: What is an example of a factor variable?

What factor variables are. A "factor" is a vector whose elements can take on one of a specific set of values. For example, "Sex" will usually take on only the values "M" or "F," whereas "Name" will generally have lots of possibilities. The set of values that the elements of a factor can take are called its levels.

Tags:

dataframe

append

r

r-factor

I have a dataframe (14.5K rows by 15 columns) containing billing data from 2001 to 2007.

I append new 2008 data to it with: alltime <- rbind(alltime,all2008)

Unfortunately that generates a warning:

> Warning message: In `[<-.factor`(`*tmp*`, ri, value = c(NA, NA, NA, NA, NA, NA, NA,  :   invalid factor level, NAs generated

My guess is that there are some new patients whose names were not in the previous dataframe and therefore it would not know what level to give those. Similarly new unseen names in the 'referring doctor' column.

What's the solution?

687

asked Oct 27 '09 18:10

Farrel

2 Answers

It could be caused by mismatch of types in two data.frames.

First of all check types (classes). To diagnostic purposes do this:

new2old <- rbind( alltime, all2008 ) # this gives you a warning old2new <- rbind( all2008, alltime ) # this should be without warning  cbind(     alltime = sapply( alltime, class),     all2008 = sapply( all2008, class),     new2old = sapply( new2old, class),     old2new = sapply( old2new, class) )

I expect there be a row looks like:

            alltime  all2008   new2old  old2new ...         ...      ...       ...      ... some_column "factor" "numeric" "factor" "character" ...         ...      ...       ...      ...

If so then explanation: rbind don't check types match. If you analyse rbind.data.frame code then you could see that the first argument initialized output types. If in first data.frame type is a factor, then output data.frame column is factor with levels unique(c(levels(x1),levels(x2))). But when in second data.frame column isn't factor then levels(x2) is NULL, so levels don't extend.

It means that your output data are wrong! There are NA's instead of true values

I suppose that:

you create you old data with another R/RODBC version so types were created with different methods (different settings - decimal separator maybe)
there are NULL's or some specific data in problematic column, eg. someone change column under database.

Solution:

find wrong column and find reason why its's wrong and fixed. Eliminate cause not symptoms.

166

answered Oct 10 '22 18:10

Marek

An "easy" way is to simply not have your strings set as factors when importing text data.

Note that the read.{table,csv,...} functions take a stringsAsFactors parameter, which is by default set to TRUE. You can set this to FALSE while you're importing and rbind-ing your data.

If you'd like to set the column to be a factor at the end, you can do that too.

For example:

alltime <- read.table("alltime.txt", stringsAsFactors=FALSE) all2008 <- read.table("all2008.txt", stringsAsFactors=FALSE) alltime <- rbind(alltime, all2008) # If you want the doctor column to be a factor, make it so: alltime$doctor <- as.factor(alltime$doctor)

answered Oct 10 '22 19:10

Steve Lianoglou

Related questions
                            
                                How to sort a data frame by date
                            
                                Read a text file in R line by line
                            
                                Understanding dates and plotting a histogram with ggplot2 in R
                            
                                ggplot2: facet_wrap strip color based on variable in data set
                            
                                R knitr Markdown: Output Plots within For Loop
                            
                                R Plotting confidence bands with ggplot
                            
                                How to automatically include all 2-way interactions in a glm model in R
                            
                                Locate the ".Rprofile" file generating default options
                            
                                List of ggplot2 theme options?
                            
                                The R %in% operator
                            
                                Global variables in packages in R
                            
                                Insert a character at a specific location in a string
                            
                                Subsetting R data frame results in mysterious NA rows
                            
                                Change bar plot colour in geom_bar with ggplot2 in r
                            
                                Common main title of a figure panel compiled with par(mfrow)
                            
                                Controlling the order of points in ggplot2?
                            
                                Generate list of all possible combinations of elements of vector
                            
                                Read a CSV from github into R
                            
                                Formatting dates on X axis in ggplot2
                            
                                How to organize large Shiny apps?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With