Calculating percent of row total with plyr

Q: How do I find the percentage of a total in R?

To calculate percent, we need to divide the counts by the count sums for each sample, and then multiply by 100. This can also be done using the function decostand from the vegan package with method = "total" .

Q: How do you find the percentage of grouped data?

To do this, divide the frequency by the total number of results and multiply by 100. In this case, the frequency of the first row is 1 and the total number of results is 10. The percentage would then be 10.0.

Tags:

r

I am currently using cast on a melted table to calculate the total of each value at the combination of ID variables ID1 (row names) and ID2 (column headers), along with grand totals for each row using margins="grand_col".

c <- cast(m, ID1 ~ ID2, sum, margins="grand_col")

  ID1      ID2a  ID2b     ID2c     ID2d   ID2e    (all)
1  ID1a  6459695  885473  648019  453613 1777308 10224108
2  ID1b  7263529 1411355  587785  612730 2458672 12334071
3  ID1c  7740364 1253524  682977  886897 3559283 14123045

So far, so R-like.

Then I divide each cell by its row total to get a percentage of the total.

c[,2:6]<-c[,2:6] / c[,7]

This looks kludgy. Is there something I should be doing in cast or maybe in plyr to handle the percent of margin calculation in the first command?

Thanks, Matt

672

asked Nov 23 '09 19:11

MW Frost

2 Answers

Assuming your source table looks something like this:

dfm <- structure(list(ID1 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("ID1a", "ID1b", "ID1c"
), class = "factor"), ID2 = structure(c(1L, 1L, 1L, 2L, 
2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("ID2a", 
"ID2b", "ID2c", "ID2d", "ID2e"), class = "factor"), value = c(6459695L, 
7263529L, 7740364L, 885473L, 1411355L, 1253524L, 648019L, 587785L, 
682977L, 453613L, 612730L, 886897L, 1777308L, 2458672L, 3559283L
)), .Names = c("ID1", "ID2", "value"), row.names = c(NA, 
-15L), class = "data.frame")

> head(dfm)
   ID1  ID2   value
1 ID1a ID2a 6459695
2 ID1b ID2a 7263529
3 ID1c ID2a 7740364
4 ID1a ID2b  885473
5 ID1b ID2b 1411355
6 ID1c ID2b 1253524

Using ddply first to calculate the percentages, and cast to present the data in the required format

library(reshape)
library(plyr)

df1 <- ddply(dfm, .(ID1), summarise, ID2 = ID2, pct = value / sum(value))
dfc <- cast(df1, ID1 ~ ID2)

dfc
   ID1      ID2a       ID2b       ID2c       ID2d      ID2e
1 ID1a 0.6318101 0.08660638 0.06338147 0.04436700 0.1738350
2 ID1b 0.5888996 0.11442735 0.04765539 0.04967784 0.1993399
3 ID1c 0.5480662 0.08875735 0.04835905 0.06279786 0.2520195

Compared to your example, this is missing the row totals, these need to be added separately.

Not sure though, whether this solution is more elegant than the one you currently have.

200

answered Oct 25 '22 02:10

learnr

Here is a one-liner using tapply and prop.table. It does not rely on any auxilliary packages:

prop.table(tapply(dfm$value, dfm[1:2], sum), 1)

giving:

      ID2
ID1         ID2a       ID2b       ID2c       ID2d      ID2e
  ID1a 0.6318101 0.08660638 0.06338147 0.04436700 0.1738350
  ID1b 0.5888996 0.11442735 0.04765539 0.04967784 0.1993399
  ID1c 0.5480662 0.08875735 0.04835905 0.06279786 0.2520195

or this which is even shorter:

prop.table( xtabs(value ~., dfm), 1 )

answered Oct 25 '22 02:10

G. Grothendieck

Related questions
                            
                                Keras in R: looking for function equivalent to plot_model in Python
                            
                                Plotly Sankey finetuning; node alignment along x-axis, drop-off
                            
                                Is it possible to comment out part of a line in R/RStudio?
                            
                                R Shiny DataTable How to prevent row selection/deselection in columns containing hyperlinks
                            
                                Key-value mapping of axis/variable labels in ggplot
                            
                                Automatically - "Convert numbers stored as text to numbers"
                            
                                Columns not available for when training lasso model using caret
                            
                                DT Editing in Shiny application with client-side processing (server = F) throws JSON Error
                            
                                Pass a named list of models to anova.merMod
                            
                                How to check whether a vector is LIFO/FIFO decreasing
                            
                                Error in gam function in names(x) <- value: 'names' attribute must be the same length as the vector
                            
                                Reconnect to PostgreSQL database with R's pool package
                            
                                How can I pass individual `curvature` arguments in `ggplot2` `geom_curve` function?
                            
                                Is there a faster way than fread() to read big data?
                            
                                Conditionally modify ggplot theme based on presence of facets?
                            
                                How to operator join two matrix in raku-lang？
                            
                                How to write two vectors of different length into one data frame by writing same values into same row?
                            
                                Calling R script from Python does not save log file in version 4
                            
                                How to increase the width of underline drawed in legend labels in ggplot?
                            
                                Cannot fix the lack of memory problem in running "pvargmm"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With