Currently I have a file which I need to converted from wide format to long format. The example of the data is:
Subject,Cat1_Weight,Cat2_Weight,Cat3_Weight,Cat1_Sick,Cat2_Sick,Cat3_Sick
1,10,11,12,1,0,0
2,7,8,9,1,0,0
However, I need it in the long format as follows
Subject,CatNumber,Weight,Sickness
1,1,10,1
1,2,11,0
1,3,12,0
2,1,7,1
2,2,8,0
2,3,9,0
So far I have tried in R to use the melt function
datalong <- melt(exp2_simon_shortform, id ="Subject")
But it treats every single column name as a unique variable each with its own value. Does anybody know how I could get from wide to long as specified, making reference to the column header names?
Cheers.
EDIT: I've realised I made an error. My final output needs to be as follows. So from the Cat1_ portion, I actually need to get out "Cat" and "1"
Subject Animal CatNumber Weight Sickness
1 Cat 1 10 1
1 Cat 2 11 0
1 Cat 3 12 0
2 Cat 1 7 1
2 Cat 2 8 0
2 Cat 3 9 0
Any updated solutions much appreciated.
The "dplyr" + "tidyr" approach might be something like:
library(dplyr)
library(tidyr)
mydf %>%
gather(var, val, -Subject) %>%
separate(var, into = c("CatNumber", "variable")) %>%
spread(variable, val)
# Subject CatNumber Sick Weight
# 1 1 Cat1 1 10
# 2 1 Cat2 0 11
# 3 1 Cat3 0 12
# 4 2 Cat1 1 7
# 5 2 Cat2 0 8
# 6 2 Cat3 0 9
Add a mutate in there along with gsub to remove the "Cat" part of the "CatNumber" column.
Based on the discussions in chat, your data actually look something more like:
A = c("ATCint", "Blank", "None"); B = 1:5; C = c("ResumptionTime", "ResumptionMisses")
colNames <- expand.grid(A, B, C)
colNames <- sprintf("%s%d_%s", colNames[[1]], colNames[[2]], colNames[[3]])
subject = 1:60
set.seed(1)
M <- matrix(sample(10, length(subject) * length(colNames), TRUE),
nrow = length(subject), dimnames = list(NULL, colNames))
mydf <- data.frame(Subject = subject, M)
Thus, you will need to do a few additional steps to get the output you desire. Try:
library(dplyr)
library(tidyr)
mydf %>%
group_by(Subject) %>% ## Your ID variable
gather(var, val, -Subject) %>% ## Make long data. Everything except your IDs
separate(var, into = c("partA", "partB")) %>% ## Split new column into two parts
mutate(partA = gsub("(.*)([0-9]+)", "\\1_\\2", partA)) %>% ## Make new col easy to split
separate(partA, into = c("A1", "A2")) %>% ## Split this new column
spread(partB, val) ## Transform to wide form
Which yields:
Source: local data frame [900 x 5]
Subject A1 A2 ResumptionMisses ResumptionTime
(int) (chr) (chr) (int) (int)
1 1 ATCint 1 9 3
2 1 ATCint 2 4 3
3 1 ATCint 3 2 2
4 1 ATCint 4 7 4
5 1 ATCint 5 7 1
6 1 Blank 1 4 10
7 1 Blank 2 2 4
8 1 Blank 3 7 5
9 1 Blank 4 1 9
10 1 Blank 5 10 10
.. ... ... ... ... ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With