Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fitting a weighted distribution in R

Tags:

r

I'm looking to fit a weighted distribution to a data set I have.

I'm currently using the fitdist command but don't know if there is a way to add weighting.

library(fitdistrplus)
df<-data.frame(value=rlnorm(100,1,0.5),weight=runif(100,0,2))

#This is what I'm doing but not really what I want
fit_df<-fitdist(df$value,"lnorm")

#How to do this
fit_df_weighted<-fitdist(df$value,"lnorm",weight=df$weight)

I'm sure this has been answered before somewhere but I've looked and can't find anything.

thanks in advance,

Gordon

like image 972
gtwebb Avatar asked Nov 12 '13 19:11

gtwebb


People also ask

How do you fit a distribution in R?

FITTING DISTRIBUTIONS IN R We can use the function plotdist(data) to obtain the histogram and the cummulative distribution graph of teh data. Exercise: Try to simulate 10^5 observations from the most known probability distributions you know and plot their Empirical density and Cummulative distribution.

How do you make a weighted variable in R?

To set that attribute, use weight() . Alternatively, you can also create the variable and set the weight attribute in one step with weight(ds) <- makeWeight(ds$var ~ c(25, 25, 25, 25), name = "weight1") .

How does Fitdistr in R work?

The fitdistr function estimates distribution parameters by maximizing the likelihood function using the optim function. No distinction between parameters with different roles (e.g., main parameter and nuisance parameter) is made, as this paper focuses on parameter estimation from a general point-of-view.

How do you assign weights?

To calculate how much weight you need, divide the known population percentage by the percent in the sample. For this example: Known population females (51) / Sample Females (41) = 51/41 = 1.24. Known population males (49) / Sample males (59) = 49/59 = .


1 Answers

Perhaps you could use the rep() function and a quick loop to approximate the distribution.

You could multiply each weighted value by, say, 10000, round the number, and then use it to indicate how many multiples of the value you need in your vector. After running a quick loop, you could then run the vector through the fitdist() algorithm.

df$scaled_weight <- round(df$weight*10000,0)
my_vector <- vector()

## quick loop
for (i in 1:nrow(df)){
  values <- rep(df$value[i], df$scaled_weight[i])
  my_vector <- c(my_vector, values)
}

## find parameters
fit_df_weighted <- fitdist(my_vector,"lnorm")

The standard errors would be rubbish, but the estimated parameters should be sufficient.

like image 164
Hip Hop Physician Avatar answered Oct 25 '22 12:10

Hip Hop Physician