Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshaping data in R . Is it possible to have two "value variables" [duplicate]

Tags:

r

reshape

plyr

I am struggling with reshape package looking for a way “cast” a dataframe but with two (or more) values in “value.var”.

Here an example of what I want to achieve.

df <- data.frame( StudentID = c("x1", "x10", "x2", 
                            "x3", "x4", "x5", "x6", "x7", "x8", "x9"),
              StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'),
              ExamenYear    = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'),
              Exam          = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'),
              participated  = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'),  
              passed      = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'),
              stringsAsFactors = FALSE)

From df I can create the following dataframe :

tx <- ddply(df, c('ExamenYear','StudentGender'), summarize,
        participated = sum(participated      == "yes"),
        passed   = sum(passed      == "yes"))

In the reshape logic, I have two “value variables” participated and passed

I am looking for way to combine – in one dataframe – the following information :

 dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated')
 dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed')

The end table I am trying to create would look like this

tempTab1 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated')
tempTab2 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed')

as.data.frame(cbind(ExamenYear = tempTab1[,1],
                Female_Participated = tempTab1[,2],
                Female_Passed       = tempTab2[,2],
                Male_Participated    = tempTab1[,3],
                Male_Passed          = tempTab2[,3]
                ))

Is it possible to have have two "value variables" in a cast function ?

like image 661
user1043144 Avatar asked Sep 15 '12 11:09

user1043144


People also ask

What is reshaping of data in R?

Data Reshaping in R is something like arranged rows and columns in your own way to use it as per your requirements, mostly data is taken as a data frame format in R to do data processing using functions like 'rbind()', 'cbind()', etc. In this process, you reshape or re-organize the data into rows and columns.

What does the reshape function do in R?

Description. This function reshapes a data frame between 'wide' format with repeated measurements in separate columns of the same record and 'long' format with the repeated measurements in separate records.


1 Answers

Since you've gotten this far, why not melt your tx object and use dcast as follows:

dcast(melt(tx, id.vars=c(1, 2)), ExamenYear ~ StudentGender + variable)
#   ExamenYear F_participated F_passed M_participated M_passed
# 1       2007              1        1              1        1
# 2       2008              1        1              2        2
# 3       2009             NA       NA              3        2

A more direct approach, however, would probably be to melt your data from the start:

df.m <- melt(df, id.vars=c(1:4))
dcast(df.m, ExamenYear ~ StudentGender + variable, 
      function(x) sum(x == "yes"))
#   ExamenYear F_participated F_passed M_participated M_passed
# 1       2007              1        1              1        1
# 2       2008              1        1              2        2
# 3       2009              0        0              3        2

Update: The Base R Approach

While the required code isn't as "pretty", it is also not too difficult to do this in base R. Here's one approach:

  1. Use aggregate() to get tx from your example.

    dfa <- aggregate(cbind(participated, passed) ~ 
      ExamenYear + StudentGender, df, function(x) sum(x == "yes"))
    dfa
    #   ExamenYear StudentGender participated passed
    # 1       2007             F            1      1
    # 2       2008             F            1      1
    # 3       2007             M            1      1
    # 4       2008             M            2      2
    # 5       2009             M            3      2
    
  2. Use reshape to transform dfa from "long" to "wide".

    reshape(dfa, direction = "wide", 
            idvar="ExamenYear", timevar="StudentGender")
    #   ExamenYear participated.F passed.F participated.M passed.M
    # 1       2007              1        1              1        1
    # 2       2008              1        1              2        2
    # 5       2009             NA       NA              3        2
    
like image 104
A5C1D2H2I1M1N2O1R2T1 Avatar answered Dec 17 '22 14:12

A5C1D2H2I1M1N2O1R2T1