Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does R change the variable type when prepending NA values to a data frame with factors?

I have a problem with the way R coerces variable types when using rbind of two data.frames with NA values. I illustrate by example:

x<-factor(sample(1:3,10,T))
y<-rnorm(10)
dat<-data.frame(x,y)
NAs<-data.frame(matrix(NA,ncol=ncol(dat),nrow=nrow(dat)))
colnames(NAs)<-colnames(dat)

Now the goal is to append dat and NAs while keeping the variable types factor and numeric of x and y. When I give:

dat_forward<-rbind(dat,NAs)
is.factor(dat_forward$x)

this works fine. However the backward direction using rbind fails:

dat_backward<-rbind(NAs,dat)
is.factor(dat_backward$x)
is.character(dat_backward$x)

Now x is coerced to character level. I am confused - can't it stay factor type even if I use the other order of binding? What would be a straight forward change to my code to reach my goal?

like image 249
tomka Avatar asked Feb 28 '14 15:02

tomka


2 Answers

Here's a fairly simple way to get the column classes right:

x <- rbind(dat[1,], NAs, dat)[-1,]
str(x)
#  $ x: Factor w/ 3 levels "1","2","3": NA NA NA NA NA NA NA NA NA NA ...
#  $ y: num  NA NA NA NA NA NA NA NA NA NA ...

More generally, if you are really needing this often, you could create an rbind-like function that takes an additional argument indicating the data.frame to whose column classes you'd like to coerce all of the others' columns:

myrbind <- function(x, ..., template=x) {
    do.call(rbind, c(list(template[1,]), list(x), list(...)))[-1,]
}

str(myrbind(NAs, dat,  template=dat))
# 'data.frame': 20 obs. of  2 variables:
#  $ x: Factor w/ 3 levels "1","2","3": NA NA NA NA NA NA NA NA NA NA ...
#  $ y: num  NA NA NA NA NA NA NA NA NA NA ...

## If no 'template' argument is supplied, myrbind acts just like rbind    
str(myrbind(dat, NAs))
# 'data.frame': 20 obs. of  2 variables:
#  $ x: Factor w/ 3 levels "1","2","3": 3 3 3 3 2 3 1 1 3 2 ...
#  $ y: num  0.303 1.77 -1.38 1.731 0.033 ...
like image 169
Josh O'Brien Avatar answered Oct 23 '22 21:10

Josh O'Brien


Similarly, you could just convert the column in NAs to factor

NAs$x<-factor(NAs$x)
dat_backward<-rbind(NAs,dat) 
is.factor(dat_backward$x) # TRUE
is.character(dat_backward$x) # FALSE
like image 3
nograpes Avatar answered Oct 23 '22 21:10

nograpes