I am struggling with reshape package looking for a way “cast” a dataframe but with two (or more) values in “value.var”.
Here an example of what I want to achieve.
df <- data.frame( StudentID = c("x1", "x10", "x2",
"x3", "x4", "x5", "x6", "x7", "x8", "x9"),
StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'),
ExamenYear = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'),
Exam = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'),
participated = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'),
passed = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'),
stringsAsFactors = FALSE)
From df I can create the following dataframe :
tx <- ddply(df, c('ExamenYear','StudentGender'), summarize,
participated = sum(participated == "yes"),
passed = sum(passed == "yes"))
In the reshape logic, I have two “value variables” participated and passed
I am looking for way to combine – in one dataframe – the following information :
dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated')
dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed')
The end table I am trying to create would look like this
tempTab1 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated')
tempTab2 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed')
as.data.frame(cbind(ExamenYear = tempTab1[,1],
Female_Participated = tempTab1[,2],
Female_Passed = tempTab2[,2],
Male_Participated = tempTab1[,3],
Male_Passed = tempTab2[,3]
))
Is it possible to have have two "value variables" in a cast function ?
Data Reshaping in R is something like arranged rows and columns in your own way to use it as per your requirements, mostly data is taken as a data frame format in R to do data processing using functions like 'rbind()', 'cbind()', etc. In this process, you reshape or re-organize the data into rows and columns.
Description. This function reshapes a data frame between 'wide' format with repeated measurements in separate columns of the same record and 'long' format with the repeated measurements in separate records.
Since you've gotten this far, why not melt
your tx
object and use dcast
as follows:
dcast(melt(tx, id.vars=c(1, 2)), ExamenYear ~ StudentGender + variable)
# ExamenYear F_participated F_passed M_participated M_passed
# 1 2007 1 1 1 1
# 2 2008 1 1 2 2
# 3 2009 NA NA 3 2
A more direct approach, however, would probably be to melt
your data from the start:
df.m <- melt(df, id.vars=c(1:4))
dcast(df.m, ExamenYear ~ StudentGender + variable,
function(x) sum(x == "yes"))
# ExamenYear F_participated F_passed M_participated M_passed
# 1 2007 1 1 1 1
# 2 2008 1 1 2 2
# 3 2009 0 0 3 2
While the required code isn't as "pretty", it is also not too difficult to do this in base R. Here's one approach:
Use aggregate()
to get tx
from your example.
dfa <- aggregate(cbind(participated, passed) ~
ExamenYear + StudentGender, df, function(x) sum(x == "yes"))
dfa
# ExamenYear StudentGender participated passed
# 1 2007 F 1 1
# 2 2008 F 1 1
# 3 2007 M 1 1
# 4 2008 M 2 2
# 5 2009 M 3 2
Use reshape
to transform dfa
from "long" to "wide".
reshape(dfa, direction = "wide",
idvar="ExamenYear", timevar="StudentGender")
# ExamenYear participated.F passed.F participated.M passed.M
# 1 2007 1 1 1 1
# 2 2008 1 1 2 2
# 5 2009 NA NA 3 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With