I have a data frame of multiple pairs of estimates and variances for several model parameters each within one of a number of sections. Here's a function that generates the illustrative sample:
samplerats <- function(){
set.seed(310366)
d = data.frame(section=c(rep("S1",10),rep("S2",10),rep("S3",5)))
nr = nrow(d)
for(i in 1:5){
d[[paste0("est_v",i)]] = rnorm(nr)
d[[paste0("var_v",i)]] = runif(nr)
}
d
}
and here's the start of what you get:
> d=samplerats()
> head(d)
section est_v1 var_v1 est_v2 var_v2 est_v3 var_v3
1 S1 0.3893008 0.1620882 -1.1915391 0.15439565 0.62022284 0.5487519
2 S1 0.8221099 0.3280630 0.7729817 0.14810283 -1.11337584 0.9947342
3 S1 0.8023230 0.1862810 -1.5285389 0.85648574 -1.74666907 0.4267944
4 S1 -0.2252865 0.5660111 -0.4348341 0.53013027 0.01823185 0.1379821
5 S1 -0.9475335 0.7904085 -1.0882961 0.40567780 1.69607397 0.3450983
6 S1 0.4415259 0.2969032 0.9200723 0.08754107 0.57010457 0.7579002
[with another two variables and 25 rows in total]
The task is to compute the ratio the variance of the estimates for each parameter with the mean of the variance for each parameter, grouped by section.
So for example, for variable v1, crudely just to get the numbers out:
> d %>% group_by(section) %>% summarise(var(est_v1)/mean(var_v1))
Source: local data frame [3 x 2]
section var(est_v1)/mean(var_v1)
1 S1 0.5874458
2 S2 2.4449153
3 S3 2.8621725
That gives us the answer for v1
, we just need to repeat for all the other variables. Note that the column names are est_
or var_
followed by a variable name which might be alpha
or g2
or some other alphanum.
Of course I have a horrendous solution:
ratit <- function(d){
isVAR <- function(s){stringr::str_sub(s,1,4)=="var_"}
spreads = reshape2::melt(d) %>% mutate(isVAR=isVAR(variable), Variable = str_replace(variable,"^.*_",""))
vout = spreads %>% group_by(Variable, section, isVAR) %>% summarise(Z=if(isVAR(variable[1])){mean(value)}else{var(value)})
ratios = vout %>% group_by(section, Variable) %>% summarise(Vratio = Z[1]/Z[2]) %>% dcast(section ~ Variable)
ratios
}
which gives:
> ratit(d)
Using section as id variables
Using Vratio as value column: use value.var to override.
section v1 v2 v3 v4 v5
1 S1 0.5874458 3.504169 3.676488 1.1716684 1.742021
2 S2 2.4449153 1.177326 1.106337 1.0700636 3.263149
3 S3 2.8621725 2.216099 3.846062 0.7777452 2.122726
where you can see the first column is the same as the v1
-only example earlier. But yuck.
If I can melt, cast, dplyr or otherwise tidyr it up into this format:
est var section variable
1 0.3893008 0.1620882 S1 v1
2 0.8221099 0.3280630 S1 v1
3 0.8023230 0.1862810 S1 v1
4 -0.2252865 0.5660111 S1 v1
5 -0.9475335 0.7904085 S1 v1
6 0.4415259 0.2969032 S1 v1
then its trivial - dd %>% group_by(section, variable) %>% summarise(rat=var(est)/mean(var)) %>% spread(variable, rat)
But that step eludes me...
Neat solutions welcome, using anything including base R, dplyr, tidyr, data.table etc.
dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.
Now we are going to learn about the two packages useful to work with data: dplyr is a package for making tabular data manipulation easier. tidyr enables you to swiftly convert between different data formats.
Data. table uses shorter syntax than dplyr, but is often more nuanced and complex. dplyr use a pipe operator, which is more intuitive for beginners to read and debug. Moreover, many other libraries use pipe operators, such as ggplot2 and tidyr.
By using group_by() function from dplyr package we can perform group by on multiple columns or variables (two or more columns) and summarise on multiple columns for aggregations.
This should do the trick:
dd <- reshape(d, varying = 2:11, direction = 'long', sep="_", timevar="variable")
head(dd)
# section variable est var id
# 1.v1 S1 v1 0.3893008 0.1620882 1
# 2.v1 S1 v1 0.8221099 0.3280630 2
# 3.v1 S1 v1 0.8023230 0.1862810 3
# 4.v1 S1 v1 -0.2252865 0.5660111 4
# 5.v1 S1 v1 -0.9475335 0.7904085 5
# 6.v1 S1 v1 0.4415259 0.2969032 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With