Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

model.matrix generates fewer rows than original data.frame

Why doesn't a model matrix necessarily have the same number of rows as the data frame?

mergem = model.matrix(as.formula(paste(response, '~ .')), data=mergef) dim(mergef) # [1] 115562     71 dim(mergem) # [1] 66786   973 

I tried looking for hints in the documentation but couldn't find anything. Thanks in advance.

like image 824
Yang Avatar asked Jun 22 '11 23:06

Yang


1 Answers

Well, if a row has NAs in it, that row is (by default) removed:

d <- data.frame(x=c(1,1,2), y=c(2,2,4), z=c(4,NA,8)) m <- model.matrix(x ~ ., data=d)  nrow(d) # 3 nrow(m) # 2 

This behavior is controlled by the option "na.action":

options(na.action="na.fail") m <- model.matrix(x ~ ., data=d) # Error: missing values in object 
like image 127
Tommy Avatar answered Sep 19 '22 18:09

Tommy