I'm trying to run a GLM in R for biomass data (reductive biomass and ratio of reproductive biomass to vegetative biomass) as a function of habitat type ("hab"), year data was collected ("year"), and site of data collection ("site"). My data looks like it would fit a Gamma distribution well, but I have 8 observations with zero biomass (out of ~800 observations), so the model won't run. What's the best way to deal with this? What would be another error distribution to use? Or would adding a very small value (such as .0000001) to my zero observations be viable?
My model is:
reproductive_biomass<-glm(repro.biomass~hab*year + site, data=biom, family = Gamma(link = "log"))
The gamma function has no zeroes, so the reciprocal gamma function 1Γ(z) is an entire function. In fact, the gamma function corresponds to the Mellin transform of the negative exponential function: Other extensions of the factorial function do exist, but the gamma function is the most popular and useful.
The Generalized Linear Model (GLM) for the Gamma distribution (glmGamma) is widely used in modeling continuous, non-negative and positive-skewed data, such as insurance claims and survival data.
A Gamma error distribution with a log link is a common family to fit GLMs with in ecology. It works well for positive-only data with positively-skewed errors. The Gamma distribution is flexible and can mimic, among other shapes, a log-normal shape.
Some usual mean link functions in the gamma regression are: the logarithm function, g(µ) = log(µ); the identity function, g(µ) = µ, and the inverse function g(µ)=1/µ. In generalized linear models, the canonical link for the mean is the inverse function.
Ah, zeroes - gotta love them.
Depending on the system you're studying, I'd be tempted to check out zero-inflated or hurdle models - the basic idea is that there are two components to the model: some binomial process deciding whether the response is zero or nonzero, and then a gamma that works on the nonzeroes. Slick part is you can then do inferences on the coefficients of both models and even use different coefficients for both.
http://seananderson.ca/2014/05/18/gamma-hurdle.html ... but a search for "zero-inflated gamma" or "tweedie models" might also yield something informative and/or scholarly.
In an ideal world, your analytic tool should fit your system and your intended inferences. The zero-inflated world is pretty sweet, but is conditional on the assumption of separate processes. Thus an important question to answer, of course, is what zeroes "mean" in the context of your study, and only you can answer that - whether they're numbers that just happened to be really really small, or true zeroes that are the result of some confounding process like your coworker spilling the bleach (or something otherwise uninteresting to your study), or else true zeroes that ARE interesting.
Another thought: ask the same question over on crossvalidated, and you'll probably get an even more statistically informed answer. Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With