I'm running through a large dataset chunk by chunk, updating a list of linear models as I go using the biglm function. The issue occurs when a particular chunk does not contain all the factors that I have in my linear model, and I get this error:
Error in update.biglm(model, new) : model matrices incompatible
The description of update.biglm mentions that factor levels must be the same across all chunks. I could probably come up with a workaround to avoid this, but there must be a better way. This pdf, on the 'biglm' page, mentions that "Factors must have their full set of levels specified (not necessarily present in the data chunk)". So I think there is some way to specify all the possible levels so that I can update a model with not all the factors present, but I can't figure out how to do it.
Here's an example piece of code to illustrate my problem:
df = data.frame(a = rnorm(12),b = as.factor(rep(1:4,each = 3)),c = rep(0:1,6))
model = biglm(a~b+c,data = df
df.new = data.frame(a = rnorm(6),b = as.factor(rep(1:2,each = 3)),c =rep(0:1, 3))
model.new = update(model,df.new)
Thanks for any advice you have.
I came across this problem also. Are the variables in your large data frame specified as factors before breaking them into chunks? Also, is the data set formatted as a data frame?
large_df <- as.data.frame(large_data_set) # just to make sure it's a df.
large_df$factor.vars <- as.factor(large_df$factor.vars)
If this is the case, then all of the factor levels should be preserved in the factor variables even after breaking the data frame into chunks. This will ensure that biglm creates the proper design matrix from the first call, and that all subsequent updates will be compatible.
If you have different data frames from the start, (as you illustrate in your example), perhaps you should merge them into one before breaking down into chunks. Continuing from your example:
df.large <- rbind(df,df.new)
chunk1 <- df.large[1:12,]
chunk2 <- df.large[13:18,]
model <- biglm(a~b+c,data = chunk1)
model.new <- update(model,chunk2) # this is now compatible
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With