Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshaping wide to long with multiple values columns [duplicate]

I need to reshape my wide table into long format but keeping multiple fields for each record, for example:

dw <- read.table(header=T, text='  sbj f1.avg f1.sd f2.avg f2.sd  blabla    A   10    6     50     10      bA    B   12    5     70     11      bB    C   20    7     20     8       bC    D   22    8     22     9       bD  ')  # Now I want to melt this table, keeping both AVG and SD as separate fields for each measurement, to get something like this:   #    sbj var avg  sd  blabla  #     A   f1  10  6     bA  #     A   f2  50  10    bA  #     B   f1  12  5     bB  #     B   f2  70  11    bB  #     C   f1  20  7     bC  #     C   f2  20  8     bC  #     D   f1  22  8     bD  #     D   f2  22  9     bD 

I have basic knowledge of using melt and reshape, but it is not obvious for me how to apply such reshaping in my case.

like image 240
Vasily A Avatar asked May 30 '14 00:05

Vasily A


People also ask

How do I convert long data to wide data in R?

To convert long data back into a wide format, we can use the cast function. There are many cast functions, but we will use the dcast function because it is used for data frames.

How do you reshape a data set?

You can reshape a stacked DataFrame back to its unstacked format with the unstack() function. By default, the innermost level is unstacked. In our example, it was a number. However, you can unstack a different level by passing a level number or name as a parameter to the unstack() method.


2 Answers

reshape does this with the appropriate arguments.

varying lists the columns which exist in the wide format, but are split into multiple rows in the long format. v.names is the long format equivalents. Between the two, a mapping is created.

From ?reshape:

Also, guessing is not attempted if v.names is given explicitly. Notice that the order of variables in varying is like x.1,y.1,x.2,y.2.

Given these varying and v.names arguments, reshape is smart enough to see that I've specified that the index is before the dot here (i.e., order 1.x, 1.y, 2.x, 2.y). Note that the original data has the columns in this order, so we can specify varying=2:5 for this example data, but that is not safe in general.

Given the values of times and v.names, reshape splits the varying columns on a . character (the default sep argument) to create the columns in the output.

times specifies values that are to be used in the created var column, and v.names are pasted onto these values to get column names in the wide format for mapping to the result.

Finally, idvar is specified to be the sbj column, which identifies individual records in the wide format (thanks @thelatemail).

reshape(dw, direction='long',          varying=c('f1.avg', 'f1.sd', 'f2.avg', 'f2.sd'),          timevar='var',         times=c('f1', 'f2'),         v.names=c('avg', 'sd'),         idvar='sbj')  ##      sbj blabla var avg sd ## A.f1   A     bA  f1  10  6 ## B.f1   B     bB  f1  12  5 ## C.f1   C     bC  f1  20  7 ## D.f1   D     bD  f1  22  8 ## A.f2   A     bA  f2  50 10 ## B.f2   B     bB  f2  70 11 ## C.f2   C     bC  f2  20  8 ## D.f2   D     bD  f2  22  9 
like image 75
Matthew Lundberg Avatar answered Sep 23 '22 12:09

Matthew Lundberg


Another option using Hadley's new tidyr package.

library(tidyr) library(dplyr)  dw <- read.table(header=T, text='  sbj f1.avg f1.sd f2.avg f2.sd  blabla    A   10    6     50     10      bA    B   12    5     70     11      bB    C   20    7     20     8       bC    D   22    8     22     9       bD  ')  dw %>%    gather(v, value, f1.avg:f2.sd) %>%    separate(v, c("var", "col")) %>%    arrange(sbj) %>%    spread(col, value) 
like image 25
Maiasaura Avatar answered Sep 22 '22 12:09

Maiasaura