I'm looking to fit a weighted distribution to a data set I have.
I'm currently using the fitdist command but don't know if there is a way to add weighting.
library(fitdistrplus)
df<-data.frame(value=rlnorm(100,1,0.5),weight=runif(100,0,2))
#This is what I'm doing but not really what I want
fit_df<-fitdist(df$value,"lnorm")
#How to do this
fit_df_weighted<-fitdist(df$value,"lnorm",weight=df$weight)
I'm sure this has been answered before somewhere but I've looked and can't find anything.
thanks in advance,
Gordon
FITTING DISTRIBUTIONS IN R We can use the function plotdist(data) to obtain the histogram and the cummulative distribution graph of teh data. Exercise: Try to simulate 10^5 observations from the most known probability distributions you know and plot their Empirical density and Cummulative distribution.
To set that attribute, use weight() . Alternatively, you can also create the variable and set the weight attribute in one step with weight(ds) <- makeWeight(ds$var ~ c(25, 25, 25, 25), name = "weight1") .
The fitdistr function estimates distribution parameters by maximizing the likelihood function using the optim function. No distinction between parameters with different roles (e.g., main parameter and nuisance parameter) is made, as this paper focuses on parameter estimation from a general point-of-view.
To calculate how much weight you need, divide the known population percentage by the percent in the sample. For this example: Known population females (51) / Sample Females (41) = 51/41 = 1.24. Known population males (49) / Sample males (59) = 49/59 = .
Perhaps you could use the rep()
function and a quick loop to approximate the distribution.
You could multiply each weighted value by, say, 10000, round the number, and then use it to indicate how many multiples of the value you need in your vector. After running a quick loop, you could then run the vector through the fitdist()
algorithm.
df$scaled_weight <- round(df$weight*10000,0)
my_vector <- vector()
## quick loop
for (i in 1:nrow(df)){
values <- rep(df$value[i], df$scaled_weight[i])
my_vector <- c(my_vector, values)
}
## find parameters
fit_df_weighted <- fitdist(my_vector,"lnorm")
The standard errors would be rubbish, but the estimated parameters should be sufficient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With