Here is an example of what the first few rows of my imported dataframe looks like (in the full dataset there are a total of five levels/factors for the subject variable the other two are Algebra II and Geometry).
SID firstName lastName subject sumScaleScore sumPerformanceLevel
604881 JIM Ro Mathematics 912 2
604881 JIM Ro ELA 964 4
594181 JERRY Chi ELA 997 1
594181 JERRY Chi Mathematics 918 3
564711 KILE Gamma ELA 933 5
564711 KILE Gamma Algebra I 1043 7
I want to restructure it from the above long format (where each person has two rows) to a wide format (where each person has one row). For example the first row of new data would contain:
SID firstName lastName sumScaleScore_Mathematics sumPerformanceLevel_Mathematics sumScaleScore_ELA sumPerformanceLevel_ELA
604881 JIM Ro 912 2 964 4
I've tried reshape2's melt, dcast, and some other packages along with reading some help files, but my coding just ain't cutting it. SPSS does this quite easily using "casestovars," but I'm new to r and having no luck. Any ideas?
melt
using the first four columns and then use dcast
:
library(reshape2)
m <- melt(DF, id = 1:4)
dcast(m, SID + firstName + lastName ~...)
giving:
SID firstName lastName AlgebraI_sumScaleScore AlgebraI_sumPerformanceLevel
1 564711 KILE Gamma 1043 7
2 594181 JERRY Chi NA NA
3 604881 JIM Ro NA NA
ELA_sumScaleScore ELA_sumPerformanceLevel Mathematics_sumScaleScore
1 933 5 NA
2 997 1 918
3 964 4 912
Mathematics_sumPerformanceLevel
1 NA
2 3
3 2
Note: We used this input:
Lines <- "SID firstName lastName subject sumScaleScore sumPerformanceLevel
604881 JIM Ro Mathematics 912 2
604881 JIM Ro ELA 964 4
594181 JERRY Chi ELA 997 1
594181 JERRY Chi Mathematics 918 3
564711 KILE Gamma ELA 933 5
564711 KILE Gamma AlgebraI 1043 7"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)
The dcast
function has been reworked in the "data.table" package and now accepts multiple value.var
s.
One big change is that you can directly cast multiple columns to a wide form without first having to melt
the data, making the process far more efficient than the present reshape2
approach.
library(data.table)
dcast(as.data.table(DF), ... ~ subject, value.var = c("sumScaleScore", "sumPerformanceLevel"))
## SID firstName lastName sumScaleScore_AlgebraI sumScaleScore_ELA
## 1: 564711 KILE Gamma 1043 933
## 2: 594181 JERRY Chi NA 997
## 3: 604881 JIM Ro NA 964
## sumScaleScore_Mathematics sumPerformanceLevel_AlgebraI sumPerformanceLevel_ELA
## 1: NA 7 5
## 2: 918 NA 1
## 3: 912 NA 4
## sumPerformanceLevel_Mathematics
## 1: NA
## 2: 3
## 3: 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With