Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Restructure data in r: reshape, dcast, melt...nothing seems to work for this dataframe

Tags:

r

Here is an example of what the first few rows of my imported dataframe looks like (in the full dataset there are a total of five levels/factors for the subject variable the other two are Algebra II and Geometry).

SID   firstName lastName    subject       sumScaleScore sumPerformanceLevel
604881  JIM     Ro          Mathematics   912           2
604881  JIM     Ro          ELA           964           4
594181  JERRY   Chi         ELA           997           1
594181  JERRY   Chi         Mathematics   918           3
564711  KILE    Gamma       ELA           933           5
564711  KILE    Gamma       Algebra I     1043          7

I want to restructure it from the above long format (where each person has two rows) to a wide format (where each person has one row). For example the first row of new data would contain:

SID  firstName  lastName  sumScaleScore_Mathematics  sumPerformanceLevel_Mathematics  sumScaleScore_ELA  sumPerformanceLevel_ELA
604881 JIM      Ro        912                        2                                964                4

I've tried reshape2's melt, dcast, and some other packages along with reading some help files, but my coding just ain't cutting it. SPSS does this quite easily using "casestovars," but I'm new to r and having no luck. Any ideas?

like image 322
dca Avatar asked Dec 19 '22 21:12

dca


2 Answers

melt using the first four columns and then use dcast:

library(reshape2)
m <- melt(DF, id = 1:4)
dcast(m, SID + firstName + lastName ~...)

giving:

     SID firstName lastName AlgebraI_sumScaleScore AlgebraI_sumPerformanceLevel
1 564711      KILE    Gamma                   1043                            7
2 594181     JERRY      Chi                     NA                           NA
3 604881       JIM       Ro                     NA                           NA
  ELA_sumScaleScore ELA_sumPerformanceLevel Mathematics_sumScaleScore
1               933                       5                        NA
2               997                       1                       918
3               964                       4                       912
  Mathematics_sumPerformanceLevel
1                              NA
2                               3
3                               2

Note: We used this input:

Lines <- "SID   firstName lastName    subject       sumScaleScore sumPerformanceLevel
604881  JIM     Ro          Mathematics   912           2
604881  JIM     Ro          ELA           964           4
594181  JERRY   Chi         ELA           997           1
594181  JERRY   Chi         Mathematics   918           3
564711  KILE    Gamma       ELA           933           5
564711  KILE    Gamma       AlgebraI     1043          7"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)
like image 173
G. Grothendieck Avatar answered Mar 16 '23 00:03

G. Grothendieck


The dcast function has been reworked in the "data.table" package and now accepts multiple value.vars.

One big change is that you can directly cast multiple columns to a wide form without first having to melt the data, making the process far more efficient than the present reshape2 approach.

library(data.table)
dcast(as.data.table(DF), ... ~ subject, value.var = c("sumScaleScore", "sumPerformanceLevel"))
##       SID firstName lastName sumScaleScore_AlgebraI sumScaleScore_ELA
## 1: 564711      KILE    Gamma                   1043               933
## 2: 594181     JERRY      Chi                     NA               997
## 3: 604881       JIM       Ro                     NA               964
##    sumScaleScore_Mathematics sumPerformanceLevel_AlgebraI sumPerformanceLevel_ELA
## 1:                        NA                            7                       5
## 2:                       918                           NA                       1
## 3:                       912                           NA                       4
##    sumPerformanceLevel_Mathematics
## 1:                              NA
## 2:                               3
## 3:                               2
like image 45
A5C1D2H2I1M1N2O1R2T1 Avatar answered Mar 16 '23 01:03

A5C1D2H2I1M1N2O1R2T1