Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fiting 2-parameters weibull distribution for tabulated data

Tags:

r

I'm trying to adjust a weibull distribution to one tabulated data. After process my point cloud, I get columns with the number of returns into each 1-meter slices of height. Example:

a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23)
colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5,28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5)

In my example above, the height class with center 13.5 meters has 7 points inside.

If I plot the matrix a is possible to visualize the data distribution:

barplot(a)

enter image description here

Does somebody have a suggestion about how to fit a weibull 2-paramters to that tabulated data?

Thanks in advance!

like image 588
Gorgens Avatar asked Feb 16 '23 15:02

Gorgens


1 Answers

You could do maximum likelihood on the censored data.

a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23)
colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5,
                 28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5)


centers <- as.numeric(colnames(a))
low <- centers - .5
up <- centers + .5

ll.weibullCensored <- function(par, dat){
    shape <- par[1]
    scale <- par[2]
    # Get the probability for each 'bin' and take the log
    log.ps <- log(pweibull(up, shape, scale) - pweibull(low, shape, scale))
    # Sum the logs of the bin probabilities as many times
    # as they should be as dictated by the data
    sum(rep(log.ps, dat))
}

# Use optim or any other function to find a set
# of parameters that maximizes the log likelihood
o.optim <- optim(c(9, 28), 
                 ll.weibullCensored, 
                 dat = as.numeric(a), 
                 # this tells it to find max instead of a min
                 control=list(fnscale=-1))  

This gives essentially the same estimates as AndresT but their approach was to just assume all of the data fell in the center of the bins and perform maximum likelihood on that imputed data set. It doesn't make much of a difference but with this method you don't necessarily need other packages.

Edit: The fact that AndresT's solution and mine give estimates that are very similar makes a lot of sense if we look at what we're maximizing for each of the methods. In mine we're looking at the probability of falling into each of the 'bins'. AndreT's solution uses the density of the distribution at the center of the bin as a replacement for this probability. We can look at the ratio of the probability of falling into the bin and the density value in the center of the bin for each of the bins (using the shape and scale obtained from my solution) which gives:

# Probability of each bin
> ps
 [1] 0.0005495886 0.0009989085 0.0017438767 0.0029375471 0.0047912909
 [6] 0.0075863200 0.0116800323 0.0174991532 0.0255061344 0.0361186335
[11] 0.0495572085 0.0656015797 0.0832660955 0.1004801353 0.1139855466
[16] 0.1197890284 0.1144657811 0.0971503491 0.0711370586 0.0433654456
[21] 0.0210758647 0.0077516837 0.0020274896
# Density evaluated at the center of the bin
> ps.cent
 [1] 0.0005418957 0.0009868040 0.0017254545 0.0029103746 0.0047524364
 [6] 0.0075325510 0.0116083397 0.0174078328 0.0253967142 0.0359988789
[11] 0.0494450583 0.0655288551 0.0832789134 0.1006305707 0.1143085230
[16] 0.1202647955 0.1149865305 0.0975322358 0.0712125315 0.0431169222
[21] 0.0206762531 0.0074246320 0.0018651941
# Ratio of the probability and the density
> ps/ps.cent
 [1] 1.0141963 1.0122663 1.0106767 1.0093364 1.0081757 1.0071382 1.0061760
 [8] 1.0052459 1.0043084 1.0033266 1.0022682 1.0011098 0.9998461 0.9985051
[15] 0.9971745 0.9960440 0.9954712 0.9960845 0.9989402 1.0057639 1.0193271
[22] 1.0440495 1.0870127

All of these ratios are close to 1 - so the two methods are essentially trying to maximize the same likelihood.

like image 157
Dason Avatar answered Mar 23 '23 21:03

Dason