Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

wide format with dcast data.table [closed]

I would like to transform a table like this (*):

set.seed(1)
mydata <- data.frame(ID=rep(1:4, each=3), R=rep(1:3, times=4), FIXED=rep(runif(4), each=3), AAA=rnorm(12), BBB=rbinom(12,12,0.5), CCC=runif(12))

ID R    FIXED    AAA  BBB   CCC
 1 1    0.26   -0.83   8   0.82
 1 2    0.26    1.59   5   0.64
 1 3    0.26    0.32   6   0.78
 2 1    0.37   -0.82   6   0.55
 2 2    0.37    0.48   6   0.52
 2 3    0.37    0.73   4   0.78
 3 1    0.57    0.57   8   0.02
 3 2    0.57   -0.30   7   0.47
 3 3    0.57    1.51   7   0.73
 4 1    0.90    0.38   4   0.69
 4 2    0.90   -0.62   7   0.47
 4 3    0.90   -2.21   6   0.86    

Into wide format, like this:

ID  FIXED   AAA1    BBB2    CCC2    FIXED2  AAA2    BBB2    CCC2    FIXED3  AAA3    BBB3    CCC3
1   0.27    0.49       7    0.73     0.37   0.74       4    0.69      0.57  0.58       7    0.48
2   0.91    -0.31      6    0.86     0.20   1.51       8    0.44      0.90  0.39       7    0.24
3   0.94    -0.62      7    0.07     0.66  -2.21       6    0.10      0.63  1.12       6    0.32
4   0.06    -0.04      7    0.52     0.21  -0.02       3    0.66      0.18  0.94       6    0.41

How can I do it?
I've tried with

dcast(mydata, ID + FIXED ~ R, value.var=(names(mydata)[3:5])   

or even writing the column names, "AAA", "BBB", "CCC", but it produces an error and I can't get the wide format I need. I've also tried other options, with no luck.

How can I do it?

(*) In reality has much more columns, but the story is the same.

The error is:

Error in .subset2(x, i, exact = exact) : 
  recursive indexing failed at level 2
In addition: Warning message:
In if (!(value.var %in% names(data))) { :
  the condition has length > 1 and only the first element will be used
like image 562
skan Avatar asked Jun 03 '16 20:06

skan


2 Answers

You are referencing to the wrong value variables (the AAA, BBB and CCC columns have index numbers 4 - 6) and you should use setDT() to convert the dataframe to a datatable. Using:

dcast(setDT(mydata), ID + FIXED ~ R, value.var = names(mydata)[4:6])

which gives:

   ID     FIXED      AAA_1      AAA_2      AAA_3 BBB_1 BBB_2 BBB_3     CCC_1     CCC_2     CCC_3
1:  1 0.2655087 -0.8356286  1.5952808  0.3295078     8     5     6 0.8209463 0.6470602 0.7829328
2:  2 0.3721239 -0.8204684  0.4874291  0.7383247     6     6     4 0.5530363 0.5297196 0.7893562
3:  3 0.5728534  0.5757814 -0.3053884  1.5117812     8     7     7 0.0233312 0.4772301 0.7323137
4:  4 0.9082078  0.3898432 -0.6212406 -2.2146999     4     7     6 0.6927316 0.4776196 0.8612095

If you don't convert to a datatable, the data.table package will fall back to the implementation of dcast from reshape2 which is not able to hande multiple value.var's, hence the error-message.

If you want another separator, you can add for example sep = '.' parameter to dcast.

like image 88
Jaap Avatar answered Nov 12 '22 20:11

Jaap


set.seed(1)
require(data.table)
mydata <- data.table(ID=rep(1:4, each=3), R=rep(1:3, times=4), FIXED=rep(runif(4), each=3), AAA=rnorm(12), BBB=rbinom(12,12,0.5), CCC=runif(12))
dcast(mydata, ID ~ R, value.var=names(mydata)[3:6])
   ID    FIXED_1    FIXED_2    FIXED_3      AAA_1      AAA_2       AAA_3 BBB_1 BBB_2 BBB_3     CCC_1     CCC_2     CCC_3
1:  1 0.43809711 0.43809711 0.43809711 -0.4781501  0.4179416  1.35867955     6     7     6 0.6422883 0.8762692 0.7789147
2:  2 0.24479728 0.24479728 0.24479728 -0.1027877  0.3876716 -0.05380504     5     7     5 0.7973088 0.4552745 0.4100841
3:  3 0.07067905 0.07067905 0.07067905 -1.3770596 -0.4149946 -0.39428995     7     4     5 0.8108702 0.6049333 0.6547239
4:  4 0.09946616 0.09946616 0.09946616 -0.0593134  1.1000254  0.76317575     4     5     3 0.3531973 0.2702601 0.9926841
like image 29
HubertL Avatar answered Nov 12 '22 18:11

HubertL