I wrote a function (weighted.sd) that gives me some weighted statistics (like mean, SD, standard error and a 95% confidence interval). I want to apply this function for each level of a factor variable (regions) and then use the weighted statistics for each region in a ggplot2 graph with errorbars (hence the 95% confidence interval.
I also tried tapply and a for-loop. But i didn´t get it right. Also, i like to use dplyr as much as i can, because it is easy to read and understand.
Here is my best try:
#example data
data<-as.data.frame(cbind(rnorm(1:50),as.factor(rnorm(1:50)),rnorm(1:50)))
colnames(data)<-c("index_var","factor_var","weight_var")
weighted.sd <- function(x,weight){
na <- is.na(x) | is.na(weight)
x <- x[!na]
weight <- weight[!na]
sum.w <- sum(weight)
sum.w2 <- sum(weight^2)
mean.w <- sum(x * weight) / sum(weight)
x.var.w<- (sum.w / (sum.w^2 - sum.w2)) * sum(weight * (x - mean.w)^2)
x.sd.w<-sqrt((sum.w / (sum.w^2 - sum.w2)) * sum(weight * (x - mean.w)^2))
SE<- x.sd.w / sqrt(sum(weight))
error <- qnorm(0.975)*x.sd.w/sqrt(sum(weight))
left <- mean.w-error
right <- mean.w+error
return(cbind(mean.w,x.sd.w,SE,error,left,right))
}
test<- data %>%
group_by(factor_var) %>%
do(as.data.frame(weighted.sd(x=index_var,weight=weight_var)))
test
This results in an error message (sorry, part of it is german, but you are able to reproduce it with the code):
Error in as.data.frame(weighted.sd(x = index_var, weight = weight_var)) :
Fehler bei der Auswertung des Argumentes 'x' bei der Methodenauswahl
für Funktion 'as.data.frame': Error in weighted.sd(x = index_var, weight = weight_var) :
object 'index_var' not found
When using do
in dplyr
you need to use it with .$
in order to work like this:
test<- data %>%
group_by(factor_var) %>%
do(as.data.frame(weighted.sd(x=.$index_var,weight=.$weight_var)))
test
So, this will work:
> test
Source: local data frame [50 x 7]
Groups: factor_var [50]
factor_var mean.w x.sd.w SE error left right
(dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 1 1.79711934 NaN NaN NaN NaN NaN
2 2 -0.70698012 NaN NaN NaN NaN NaN
3 3 -0.85125760 NaN NaN NaN NaN NaN
4 4 -0.93903314 NaN NaN NaN NaN NaN
5 5 0.09629631 NaN NaN NaN NaN NaN
6 6 1.02720022 NaN NaN NaN NaN NaN
7 7 1.35090758 NaN NaN NaN NaN NaN
8 8 0.67814249 NaN NaN NaN NaN NaN
9 9 -0.28251464 NaN NaN NaN NaN NaN
10 10 0.38572499 NaN NaN NaN NaN NaN
.. ... ... ... ... ... ... ...
However, you data here is not very good as the negative weights (data$weight_var
) produce the above NANs. In particular the sqrt(negative number)
part.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With