Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't one have several `value.var` in `dcast`?

Why can't one have multiple variables passed to value.var in dcast? From ?dcast:

value.var name of column which stores values, see guess_value for default strategies to figure this out.

It doesn't explicitly indicate that only one single variable can be passed on as value. If however I try that, then I get an error:

> library("reshape2")
> library("MASS")
> 
> dcast(Cars93, AirBags ~ DriveTrain, mean, value.var=c("Price", "Weight"))
Error in .subset2(x, i, exact = exact) : subscript out of bounds
In addition: Warning message:
In if (!(value.var %in% names(data))) { :
  the condition has length > 1 and only the first element will be used

So is there a good reason for imposing this limitation? And is it possible to work around this (perhaps using reshape, etc.)?

like image 958
landroni Avatar asked Aug 05 '14 16:08

landroni


2 Answers

This question is very much related to your other question from earlier today.

@beginneR wrote in the comments that "As long as the existing data is already in long-format, I don't see any general need to melt it before casting." In my answer posted at your other question, I gave an example of when melt would be required, or rather, how to decide whether your data are long enough.

This question here is another example of when further melting would be required since point 3 in my answer is not satisfied.

To get the behavior you want, try the following:

C93L <- melt(Cars93, measure.vars = c("Price", "Weight"))
dcast(C93L, AirBags ~ DriveTrain + variable, mean, value.var = "value")
#              AirBags 4WD_Price 4WD_Weight Front_Price Front_Weight
# 1 Driver & Passenger       NaN        NaN    26.17273     3393.636
# 2        Driver only     21.38       3623    18.69286     2996.250
# 3               None     13.88       2987    12.98571     2703.036
#   Rear_Price Rear_Weight
# 1      33.20      3515.0
# 2      28.23      3463.5
# 3      14.90      3610.0

An alternative is to use aggregate to calculate the means, and then use reshape or dcast to go from "long" to "wide". Both are required since reshape does not perform any aggregation:

temp <- aggregate(cbind(Price, Weight) ~ AirBags + DriveTrain, 
                  Cars93, mean)
#              AirBags DriveTrain    Price   Weight
# 1        Driver only        4WD 21.38000 3623.000
# 2               None        4WD 13.88000 2987.000
# 3 Driver & Passenger      Front 26.17273 3393.636
# 4        Driver only      Front 18.69286 2996.250
# 5               None      Front 12.98571 2703.036
# 6 Driver & Passenger       Rear 33.20000 3515.000
# 7        Driver only       Rear 28.23000 3463.500
# 8               None       Rear 14.90000 3610.000

reshape(temp, direction = "wide", 
        idvar = "AirBags", timevar = "DriveTrain")
#              AirBags Price.4WD Weight.4WD Price.Front Weight.Front
# 1        Driver only     21.38       3623    18.69286     2996.250
# 2               None     13.88       2987    12.98571     2703.036
# 3 Driver & Passenger        NA         NA    26.17273     3393.636
#   Price.Rear Weight.Rear
# 1      28.23      3463.5
# 2      14.90      3610.0
# 3      33.20      3515.0
like image 194
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 21 '22 17:09

A5C1D2H2I1M1N2O1R2T1


I had the same issue and I found this answer: Error using dcast with multiple value.var that suggests to "force" data.table dcast function as follows:

# multiple value.var
data.table::dcast(Cars93, AirBags ~ DriveTrain, mean, value.var=c("Price", "Weight"))

I was able to cast multiple variables without error.

like image 28
Giacomo Avatar answered Sep 18 '22 17:09

Giacomo