Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difficulty fitting gamma distribution with R

Tags:

r

I am attempting to estimate parameters for a gamma distribution fit to ecological density (i.e. biomass per area) data. I have been using the fitdistr() command from the MASS package in R (version 3.0.0 : x86_64-w64-mingw32/x64 (64-bit)). This is a maximum likelihood estimation command for distributional parameters.

The vectors of data are quite large, but summary statistics are as follows:

Min. = 0; 1st Qu. = 87.67; Median = 199.5; Mean = 1255; Variance = 2.79E+07; 3rd Qu. = 385.6; Max. = 33880

The code I am using to run the MLE procedure is:

gdist <- fitdistr(data, dgamma, 
                  start=list(shape=1, scale=1/(mean(data))),lower=c(1,0.1))

R is giving me the following error:

Error in optim(x = c(6.46791148085828, 4060.54750836902, 99.6201565968665, : non-finite finite-difference value [1]

Others who have experienced this type of issue and have turned to stackoverflow for help seem to have found the solution in adding the "lower=" argument to their code, and/or removing zeros. I find that R will provide parameters for a fit if I remove the zero observations, but I was under the impression that gamma distributions covered the range 0 <= x > inf (Forbes et al. 2011. Statistical Distributions)?

Have I gotten the wrong impression regarding the range of the gamma distribution? Or is there some other issue I am missing regarding MLE (in which I am a novice).

like image 822
niafall Avatar asked Jul 12 '13 14:07

niafall


1 Answers

Getting a rough estimate by the method of moments (matching up the mean=shape*scale and variance=shape*scale^2) we have

mean <- 1255
var <- 2.79e7
shape = mean^2/var   ## 0.056
scale = var/mean     ## 22231

Now generate some data from this distribution:

set.seed(101)
x = rgamma(1e4,shape,scale=scale)
summary(x)
##     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##     0.00      0.00      0.06   1258.00     98.26 110600.00 

MASS::fitdistr(x,"gamma")  ## error

I strongly suspect that the problem is that the underlying optim call assumes the parameters (shape and scale, or shape and rate) are of approximately the same magnitude, which they're not. You can get around this by scaling your data:

(m <- MASS::fitdistr(x/2e4,"gamma"))  ## works fine
##      shape           rate    
##  0.0570282411   0.9067274280 
## (0.0005855527) (0.0390939393)

fitdistr gives a rate parameter rather than a scale parameter: to get back to the shape parameter you want, invert and re-scale ...

1/coef(m)["rate"]*2e4  ## 22057

By the way, the fact that the quantiles of the simulated data don't match your data very well (e.g. median of x=0.06 vs a median of 199 in your data) suggest that your data might not fit a Gamma all that well -- e.g. there might be some outliers affecting the mean and variance more than the quantiles?

PS above I used the built-in 'gamma' estimator in fitdistr rather than using dgamma: with starting values based on the method of moments, and scaling the data by 2e4, this works (although it gives a warning about NaNs produced unless we specify lower)

 m2 <- MASS::fitdistr(x/2e4,dgamma,
        start=list(shape=0.05,scale=1), lower=c(0.01,0.01))
like image 119
Ben Bolker Avatar answered Oct 21 '22 05:10

Ben Bolker