I am using the dplyr/broom package to do linear regressions for multiple sensors. The glance() function from broom will not work when I use lm() within the do statement, but will if I use biglm(). This wouldn't be an issue, but I would like the r^2, F-Statistic and p-val that glance returns quite beautifully for the traditional lm().
I've looked elsewhere and cannot find a similar case with this error:
Error in data.frame(r.squared = r.squared, adj.r.squared = adj.r.squared, :
object 'fstatistic' not found
Possible hunches:
?Anova
"The comparison between two or more models will only be valid if they are
fitted to the same dataset. This may be a problem if there are missing
values and R's default of na.action = na.omit is used."
Here is the code:
library(tidyr)
library(broom)
library(biglm) # if not install.packages("biglm")
library(dplyr)
regressionBig <- tidied_rm_outliers %>%
group_by(sensor_name, Lot.Tool, Lot.Module, Recipe, Step, Stage, MEAS_TYPE) %>%
do(fit = biglm(MEAS_AVG ~ value, data = .)) #note biglm is used
regressionBig
#extract the r^2 from the complex list type from the data frame we just stored
glances <- regressionBig %>% glance(fit)
glances %>%
ungroup() %>%
arrange(desc(r.squared))
#Biglm works but if i try the same thing with regular lm It errors on glance()
ErrorDf <- tidied_rm_outliers %>%
group_by(sensor_name, Lot.Tool, Lot.Module, Recipe, Step, Stage, MEAS_TYPE) %>%
do(fit = lm(MEAS_AVG ~ value, data = .)) #note lm is normal
ErrorDf %>% glance(fit)
#Error in data.frame(r.squared = r.squared, adj.r.squared = adj.r.squared, :
#object 'fstatistic' not found
I hate to upload the entire data frame as I know it's usually not acceptable on S/O but I am not sure I can create a reproducible example without doing so. https://www.dropbox.com/s/pt6xe4jdxj743ka/testdf.Rda?dl=0
R session info on pastebin if you would like it here!
It looks like a bad model in ErrorDf
. I diagnosed it running a for
loop.
for (i in 1:nrow(ErrorDf)){
print(i)
glance(ErrorDf$fit[[i]])
}
It looks like no coefficient for value
could be estimated for model # 94. I haven't done any further investigation, but it brings up the interesting question of how broom
should handle that.
I came across this post after encountering the same issue. If lm()
is failing because some groupings have too few cases, then you can resolve the issue by pre-filtering the data to remove these groupings before running do()
loop. Generic code below shows how one might filter out groups with less than 30 data points.
require(dplyr)
require(broom)
data_grp = ( data
%>% group_by(factor_a, factor_b)
%>% mutate(grp_cnt=n())
%>% filter(grp_cnt>30)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With