Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I arrange data from wide format to long format, and specify relationships

Tags:

r

reshape

Currently I have a file which I need to converted from wide format to long format. The example of the data is:

Subject,Cat1_Weight,Cat2_Weight,Cat3_Weight,Cat1_Sick,Cat2_Sick,Cat3_Sick
1,10,11,12,1,0,0
2,7,8,9,1,0,0

However, I need it in the long format as follows

Subject,CatNumber,Weight,Sickness
1,1,10,1
1,2,11,0
1,3,12,0
2,1,7,1
2,2,8,0
2,3,9,0

So far I have tried in R to use the melt function

datalong <- melt(exp2_simon_shortform, id ="Subject")

But it treats every single column name as a unique variable each with its own value. Does anybody know how I could get from wide to long as specified, making reference to the column header names?

Cheers.

EDIT: I've realised I made an error. My final output needs to be as follows. So from the Cat1_ portion, I actually need to get out "Cat" and "1"

Subject Animal  CatNumber   Weight  Sickness
1   Cat 1   10  1
1   Cat 2   11  0
1   Cat 3   12  0
2   Cat 1   7   1
2   Cat 2   8   0
2   Cat 3   9   0

Any updated solutions much appreciated.

like image 808
Doctor David Anderson Avatar asked Jan 18 '26 15:01

Doctor David Anderson


1 Answers

The "dplyr" + "tidyr" approach might be something like:

library(dplyr)
library(tidyr)
mydf %>%
  gather(var, val, -Subject) %>%
  separate(var, into = c("CatNumber", "variable")) %>%
  spread(variable, val) 
#   Subject CatNumber Sick Weight
# 1       1      Cat1    1     10
# 2       1      Cat2    0     11
# 3       1      Cat3    0     12
# 4       2      Cat1    1      7
# 5       2      Cat2    0      8
# 6       2      Cat3    0      9

Add a mutate in there along with gsub to remove the "Cat" part of the "CatNumber" column.


Update

Based on the discussions in chat, your data actually look something more like:

A = c("ATCint", "Blank", "None"); B = 1:5; C = c("ResumptionTime", "ResumptionMisses")

colNames <- expand.grid(A, B, C)
colNames <- sprintf("%s%d_%s", colNames[[1]], colNames[[2]], colNames[[3]])

subject = 1:60

set.seed(1)
M <- matrix(sample(10, length(subject) * length(colNames), TRUE), 
            nrow = length(subject), dimnames = list(NULL, colNames))

mydf <- data.frame(Subject = subject, M)

Thus, you will need to do a few additional steps to get the output you desire. Try:

library(dplyr)
library(tidyr)
mydf %>% 
  group_by(Subject) %>%                    ## Your ID variable
  gather(var, val, -Subject) %>%           ## Make long data. Everything except your IDs
  separate(var, into = c("partA", "partB")) %>%  ## Split new column into two parts
  mutate(partA = gsub("(.*)([0-9]+)", "\\1_\\2", partA)) %>% ## Make new col easy to split
  separate(partA, into = c("A1", "A2")) %>%                  ## Split this new column
  spread(partB, val)                                         ## Transform to wide form

Which yields:

Source: local data frame [900 x 5]

   Subject     A1    A2 ResumptionMisses ResumptionTime
     (int)  (chr) (chr)            (int)          (int)
1        1 ATCint     1                9              3
2        1 ATCint     2                4              3
3        1 ATCint     3                2              2
4        1 ATCint     4                7              4
5        1 ATCint     5                7              1
6        1  Blank     1                4             10
7        1  Blank     2                2              4
8        1  Blank     3                7              5
9        1  Blank     4                1              9
10       1  Blank     5               10             10
..     ...    ...   ...              ...            ...
like image 150
A5C1D2H2I1M1N2O1R2T1 Avatar answered Jan 21 '26 07:01

A5C1D2H2I1M1N2O1R2T1