I am struggling with reshape package looking for a way “cast” a dataframe but with two (or more) values in “value.var”. Here an example of what I want to achieve. <pre class="prettyprint"><code>df <- data.frame( StudentID = c("x1", "x10", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'), ExamenYear = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'), Exam = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'), participated = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'), passed = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'), stringsAsFactors = FALSE) </code></pre> From df I can create the following dataframe : <pre class="prettyprint"><code>tx <- ddply(df, c('ExamenYear','StudentGender'), summarize, participated = sum(participated == "yes"), passed = sum(passed == "yes")) </code></pre> In the reshape logic, I have two “value variables” participated and passed I am looking for way to combine – in one dataframe – the following information : <pre class="prettyprint"><code> dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated') dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed') </code></pre> The end table I am trying to create would look like this <pre class="prettyprint"><code>tempTab1 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated') tempTab2 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed') as.data.frame(cbind(ExamenYear = tempTab1[,1], Female_Participated = tempTab1[,2], Female_Passed = tempTab2[,2], Male_Participated = tempTab1[,3], Male_Passed = tempTab2[,3] )) </code></pre> Is it possible to have have two "value variables" in a cast function ?

Since you've gotten this far, why not <code>melt</code> your <code>tx</code> object and use <code>dcast</code> as follows: <pre class="prettyprint"><code>dcast(melt(tx, id.vars=c(1, 2)), ExamenYear ~ StudentGender + variable) # ExamenYear F_participated F_passed M_participated M_passed # 1 2007 1 1 1 1 # 2 2008 1 1 2 2 # 3 2009 NA NA 3 2 </code></pre> A more direct approach, however, would probably be to <code>melt</code> your data from the start: <pre class="prettyprint"><code>df.m <- melt(df, id.vars=c(1:4)) dcast(df.m, ExamenYear ~ StudentGender + variable, function(x) sum(x == "yes")) # ExamenYear F_participated F_passed M_participated M_passed # 1 2007 1 1 1 1 # 2 2008 1 1 2 2 # 3 2009 0 0 3 2 </code></pre> <h3>Update: The Base R Approach</h3> While the required code isn't as "pretty", it is also not too difficult to do this in base R. Here's one approach: <ol> <li> Use <code>aggregate()</code> to get <code>tx</code> from your example. <pre class="prettyprint"><code>dfa <- aggregate(cbind(participated, passed) ~ ExamenYear + StudentGender, df, function(x) sum(x == "yes")) dfa # ExamenYear StudentGender participated passed # 1 2007 F 1 1 # 2 2008 F 1 1 # 3 2007 M 1 1 # 4 2008 M 2 2 # 5 2009 M 3 2 </code></pre> </li> <li> Use <code>reshape</code> to transform <code>dfa</code> from "long" to "wide". <pre class="prettyprint"><code>reshape(dfa, direction = "wide", idvar="ExamenYear", timevar="StudentGender") # ExamenYear participated.F passed.F participated.M passed.M # 1 2007 1 1 1 1 # 2 2008 1 1 2 2 # 5 2009 NA NA 3 2 </code></pre> </li> </ol>

Reshaping data in R . Is it possible to have two "value variables" [duplicate]

Tags:

r

reshape

plyr

I am struggling with reshape package looking for a way “cast” a dataframe but with two (or more) values in “value.var”.

Here an example of what I want to achieve.

df <- data.frame( StudentID = c("x1", "x10", "x2", 
                            "x3", "x4", "x5", "x6", "x7", "x8", "x9"),
              StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'),
              ExamenYear    = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'),
              Exam          = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'),
              participated  = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'),  
              passed      = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'),
              stringsAsFactors = FALSE)

From df I can create the following dataframe :

tx <- ddply(df, c('ExamenYear','StudentGender'), summarize,
        participated = sum(participated      == "yes"),
        passed   = sum(passed      == "yes"))

In the reshape logic, I have two “value variables” participated and passed

I am looking for way to combine – in one dataframe – the following information :

 dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated')
 dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed')

The end table I am trying to create would look like this

tempTab1 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated')
tempTab2 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed')

as.data.frame(cbind(ExamenYear = tempTab1[,1],
                Female_Participated = tempTab1[,2],
                Female_Passed       = tempTab2[,2],
                Male_Participated    = tempTab1[,3],
                Male_Passed          = tempTab2[,3]
                ))

Is it possible to have have two "value variables" in a cast function ?

661

asked Sep 15 '12 11:09

user1043144

1 Answers

Since you've gotten this far, why not melt your tx object and use dcast as follows:

dcast(melt(tx, id.vars=c(1, 2)), ExamenYear ~ StudentGender + variable)
#   ExamenYear F_participated F_passed M_participated M_passed
# 1       2007              1        1              1        1
# 2       2008              1        1              2        2
# 3       2009             NA       NA              3        2

A more direct approach, however, would probably be to melt your data from the start:

df.m <- melt(df, id.vars=c(1:4))
dcast(df.m, ExamenYear ~ StudentGender + variable, 
      function(x) sum(x == "yes"))
#   ExamenYear F_participated F_passed M_participated M_passed
# 1       2007              1        1              1        1
# 2       2008              1        1              2        2
# 3       2009              0        0              3        2

Update: The Base R Approach

While the required code isn't as "pretty", it is also not too difficult to do this in base R. Here's one approach:

Use aggregate() to get tx from your example.

dfa <- aggregate(cbind(participated, passed) ~ 
  ExamenYear + StudentGender, df, function(x) sum(x == "yes"))
dfa
#   ExamenYear StudentGender participated passed
# 1       2007             F            1      1
# 2       2008             F            1      1
# 3       2007             M            1      1
# 4       2008             M            2      2
# 5       2009             M            3      2

Use reshape to transform dfa from "long" to "wide".

reshape(dfa, direction = "wide", 
        idvar="ExamenYear", timevar="StudentGender")
#   ExamenYear participated.F passed.F participated.M passed.M
# 1       2007              1        1              1        1
# 2       2008              1        1              2        2
# 5       2009             NA       NA              3        2

104

answered Dec 17 '22 14:12

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                R: combining mutiple library locations with most up-to-date packages
                            
                                Can you specify the number of columns in read.table?
                            
                                how to combine vectors with different length within a list in R?
                            
                                Histogram on Lattice
                            
                                plot multiple line segments on one graph using R
                            
                                Grid with choropleth maps in ggplot2
                            
                                check whether 2 R programs are identical
                            
                                multicore with plyr, MC
                            
                                Can I tell the R plyr package to work in parallel by default?
                            
                                Multiple boxplots in one in R
                            
                                Output a boolean from an Rscript into a Bash variable
                            
                                R : Pass argument to glm inside an R function
                            
                                R plot, x-axis and y-axis touching
                            
                                Opaque Legend Box
                            
                                How can I make a list of all the unique pairs of data points in R?
                            
                                data.table 1.8.1.: "DT1 = DT2" is not the same as DT1 = copy(DT2)?
                            
                                R Language NaN + NA behaviour
                            
                                function naming conflicts
                            
                                R function to parse command line arguments
                            
                                What is the minimum number of bootstraps to be done using bca non-parametric method

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With