I often have data where I want to compare the value of one level of variable with all the other levels of variable. Each time I write code to do this I wish it were easier. Here's an example of the problem:
Suppose I want to compare the average cost of diamonds of any cut with the average cost of best cut diamonds. To make things fair I want to do this for each clarity, separately.
Let's check we have enough data:
> with(diamonds,table(cut,clarity))
clarity
cut I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IF
Fair 210 466 408 261 170 69 17 9
Good 96 1081 1560 978 648 286 186 71
Very Good 84 2100 3240 2591 1775 1235 789 268
Premium 205 2949 3575 3357 1989 870 616 230
Ideal 146 2598 4282 5071 3589 2606 2047 1212
okay no zeroes in Idea, so let's calculate the mean.
> claritycut<-ddply(diamonds,.(clarity,cut),summarize,price=mean(price))
> claritycut
clarity cut price
1 I1 Fair 3703.533
2 I1 Good 3596.635
3 I1 Very Good 4078.226
4 I1 Premium 3947.332
5 I1 Ideal 4335.726
6 SI2 Fair 5173.916
7 SI2 Good 4580.261
8 SI2 Very Good 4988.688
9 SI2 Premium 5545.937
10 SI2 Ideal 4755.953
...
The end result I want is:
clarity variable ratio
1 I1 Fair 0.8541899
2 I1 Good 0.8295348
3 I1 Very Good 0.9406098
4 I1 Premium 0.9104200
5 I1 Ideal 1.0000000
6 SI2 Fair 1.0878822
7 SI2 Good 0.9630586
8 SI2 Very Good 1.0489356
9 SI2 Premium 1.1661043
10 SI2 Ideal 1.0000000
...
But I'm not sure how to do this neatly. Most of the rest of this question concerns an intermediate step in the calculation - the divide.
Now I want to calculate the relative price of all cuts vs Ideals. Here's the data frame I'd expect to see partway through the calculation - extracting only one level of cut:
> claritycutideal <- join(subset(claritycut,cut!="Ideal"),summarize(subset(claritycut,cut=="Ideal"),Ideal=price,clarity))
> print(claritycutideal)
Joining by: clarity
clarity cut price Ideal
1 I1 Fair 3703.533 4335.726
2 I1 Good 3596.635 4335.726
3 I1 Very Good 4078.226 4335.726
4 I1 Premium 3947.332 4335.726
5 SI2 Fair 5173.916 4755.953
6 SI2 Good 4580.261 4755.953
7 SI2 Very Good 4988.688 4755.953
8 SI2 Premium 5545.937 4755.953
...
Which works, but it's fiddly to write the above statement, and I still need to finish the calculation off, mentioning the Ideal name again.
> mutate(claritycutideal,ratio=price/Ideal)
It feels like I want something like
> cast(claritycut,clarity~cut)
Using clarity, cut as id variables
clarity Fair Good Very Good Premium Ideal
1 I1 3703.533 3596.635 4078.226 3947.332 4335.726
2 SI2 5173.916 4580.261 4988.688 5545.937 4755.953
3 SI1 4208.279 3689.533 3932.391 4455.269 3752.118
4 VS2 4174.724 4262.236 4215.760 4550.331 3284.550
...
This is totally unsuitable for the mean calculation, since I would need to know the names of all the recast levels in the calculation:
I'd like recast, but with a way to filter the levels extracted and leave the rest untouched, for instance:
> cast(claritycut,clarity~cut,subset=cut=="Ideal")
Which exists, but doesn't retain the unfiltered levels.
I would then need to melt it again, and while there's a recast, there's no remelt.
Does anyone have a neat trick to do this?
Or perhaps I'm looking at this completely the wrong way - do marginal calculations do this for me?
The following works exactly right but is fiddly:
> valuevars=function(x)x[!names(x)%in%attr(x,"idvars")]
> melt(ddply(cast(claritycut,clarity~cut),.(clarity),
function(x)valuevars(x)/x$Ideal))
I'm not sure this is neat enough, but there is a two liner:
# from your code
claritycut <- ddply(diamonds,.(clarity,cut),summarize,price=mean(price))
# 1 do that work
transform(merge(claritycut, subset(claritycut, cut=="Ideal"), by="clarity"),
ratio = price.x / price.y)
# 2 another way
ddply(claritycut, .(clarity),
function(x) data.frame(cut=x$cut,
rate=x$price / subset(x, cut == "Ideal")$price))
# 3 another way
ddply(claritycut, .(clarity),
summarize, cut=cut, rate=price / price[cut == "Ideal"])
and finally 4) here is a one-liner version:
ddply(diamonds, .(clarity),
function(x) transform(ddply(x, .(cut),
summarize, rate=mean(price)),
rate=rate/mean(subset(x, cut=="Ideal")$price)))
but too complicated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With