I am currently using cast
on a melted table to calculate the total of each value at the combination of ID variables ID1 (row names) and ID2 (column headers), along with grand totals for each row using margins="grand_col"
.
c <- cast(m, ID1 ~ ID2, sum, margins="grand_col")
ID1 ID2a ID2b ID2c ID2d ID2e (all)
1 ID1a 6459695 885473 648019 453613 1777308 10224108
2 ID1b 7263529 1411355 587785 612730 2458672 12334071
3 ID1c 7740364 1253524 682977 886897 3559283 14123045
So far, so R-like.
Then I divide each cell by its row total to get a percentage of the total.
c[,2:6]<-c[,2:6] / c[,7]
This looks kludgy. Is there something I should be doing in cast
or maybe in plyr
to handle the percent of margin calculation in the first command?
Thanks, Matt
To calculate percent, we need to divide the counts by the count sums for each sample, and then multiply by 100. This can also be done using the function decostand from the vegan package with method = "total" .
To do this, divide the frequency by the total number of results and multiply by 100. In this case, the frequency of the first row is 1 and the total number of results is 10. The percentage would then be 10.0.
Assuming your source table looks something like this:
dfm <- structure(list(ID1 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("ID1a", "ID1b", "ID1c"
), class = "factor"), ID2 = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("ID2a",
"ID2b", "ID2c", "ID2d", "ID2e"), class = "factor"), value = c(6459695L,
7263529L, 7740364L, 885473L, 1411355L, 1253524L, 648019L, 587785L,
682977L, 453613L, 612730L, 886897L, 1777308L, 2458672L, 3559283L
)), .Names = c("ID1", "ID2", "value"), row.names = c(NA,
-15L), class = "data.frame")
> head(dfm)
ID1 ID2 value
1 ID1a ID2a 6459695
2 ID1b ID2a 7263529
3 ID1c ID2a 7740364
4 ID1a ID2b 885473
5 ID1b ID2b 1411355
6 ID1c ID2b 1253524
Using ddply
first to calculate the percentages, and cast
to present the data in the required format
library(reshape)
library(plyr)
df1 <- ddply(dfm, .(ID1), summarise, ID2 = ID2, pct = value / sum(value))
dfc <- cast(df1, ID1 ~ ID2)
dfc
ID1 ID2a ID2b ID2c ID2d ID2e
1 ID1a 0.6318101 0.08660638 0.06338147 0.04436700 0.1738350
2 ID1b 0.5888996 0.11442735 0.04765539 0.04967784 0.1993399
3 ID1c 0.5480662 0.08875735 0.04835905 0.06279786 0.2520195
Compared to your example, this is missing the row totals, these need to be added separately.
Not sure though, whether this solution is more elegant than the one you currently have.
Here is a one-liner using tapply
and prop.table
. It does not rely on any auxilliary packages:
prop.table(tapply(dfm$value, dfm[1:2], sum), 1)
giving:
ID2
ID1 ID2a ID2b ID2c ID2d ID2e
ID1a 0.6318101 0.08660638 0.06338147 0.04436700 0.1738350
ID1b 0.5888996 0.11442735 0.04765539 0.04967784 0.1993399
ID1c 0.5480662 0.08875735 0.04835905 0.06279786 0.2520195
or this which is even shorter:
prop.table( xtabs(value ~., dfm), 1 )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With