Running a GLM with a Gamma distribution, but data includes zeros

Tags:

I'm trying to run a GLM in R for biomass data (reductive biomass and ratio of reproductive biomass to vegetative biomass) as a function of habitat type ("hab"), year data was collected ("year"), and site of data collection ("site"). My data looks like it would fit a Gamma distribution well, but I have 8 observations with zero biomass (out of ~800 observations), so the model won't run. What's the best way to deal with this? What would be another error distribution to use? Or would adding a very small value (such as .0000001) to my zero observations be viable?

My model is:

reproductive_biomass<-glm(repro.biomass~hab*year + site, data=biom, family = Gamma(link = "log"))

403

asked Apr 25 '17 15:04

Laura

1 Answers

Ah, zeroes - gotta love them.

Depending on the system you're studying, I'd be tempted to check out zero-inflated or hurdle models - the basic idea is that there are two components to the model: some binomial process deciding whether the response is zero or nonzero, and then a gamma that works on the nonzeroes. Slick part is you can then do inferences on the coefficients of both models and even use different coefficients for both.

http://seananderson.ca/2014/05/18/gamma-hurdle.html ... but a search for "zero-inflated gamma" or "tweedie models" might also yield something informative and/or scholarly.

In an ideal world, your analytic tool should fit your system and your intended inferences. The zero-inflated world is pretty sweet, but is conditional on the assumption of separate processes. Thus an important question to answer, of course, is what zeroes "mean" in the context of your study, and only you can answer that - whether they're numbers that just happened to be really really small, or true zeroes that are the result of some confounding process like your coworker spilling the bleach (or something otherwise uninteresting to your study), or else true zeroes that ARE interesting.

Another thought: ask the same question over on crossvalidated, and you'll probably get an even more statistically informed answer. Good luck!

168

answered Sep 28 '22 07:09

Matt Tyers

Related questions
                            
                                Sampling different numbers of rows by group in dplyr tidyverse
                            
                                Calculate ratio between all combinations of values in a row across two data sets
                            
                                Change all columns except the 1st to dollar format
                            
                                Apply if else statements across columns in data frame based on condition of value in other column
                            
                                data.table: lapply a function with multicolumn output
                            
                                Filling "implied missing values" in a data frame that has varying observations per time unit
                            
                                Count unique values of a column by pairwise combinations of another column and group by third column in R
                            
                                Cross-referencing in rticles
                            
                                How do I get unique element from a vector, keeping its name? [duplicate]
                            
                                Read column names as date format
                            
                                How can I maintain a color scheme across ggplots, while dropping unused levels in each plot?
                            
                                How to increase the size of the text in a Bayesian network plot with bnlearn in R
                            
                                R dplyr method to replace all empty factors with NA
                            
                                Adding multiple reactive plots and tables to Shiny app
                            
                                Group by aggregate dynamic column name matching
                            
                                Refering to a variable of the data frame passed in the 'data' parameter of ggplot function
                            
                                Speed up INSERT of 1 million+ rows into Postgres via R using COPY?
                            
                                How to plot a function family in ggplot2
                            
                                Print label on circle markers in leaflet in Rshiny
                            
                                How to do group matching in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Running a GLM with a Gamma distribution, but data includes zeros

Tags:

r

zero

glm

gamma

Laura

People also ask

1 Answers

Matt Tyers

Recent Activity

Donate For Us