Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to format a table with counts, percents and marginal totals

Tags:

r

I often need to create a formatted table with counts, percents and marginal totals. For example I might have data with three classes and two categories. I want to make a table with a row for each class, containing within-class counts and percents for each category and the total count for the category. Finally, a total row at the bottom totals categories, shows categories as percent of total, and an overall total.

The code is always ugly :-) and I would love to find a better way. Here is a very simple example; often it is much more complex than this.

=====  ===  =======  ===  ======  =====
Class  Yes  Yes pct   No  No pct  Total
=====  ===  =======  ===  ======  =====
one     35      65%   19     35%     54
two     21      70%    9     30%     30
three    9      56%    7     44%     16
Total   65      65%   35     35%    100
=====  ===  =======  ===  ======  =====

I know about addmargins (doesn't work on data.frame), prop.table (gives a separate table of proportions), descr::CrossTable (puts values in cells, not spread across the row). Any suggestions for how to clean this up are welcome.

Here is the code to create the above table:

library(formattable) # For nice percents
library(tidyverse)

# Make up some data. Three classes with two categories within each class
# Order of cls is important so it is a factor
d = tibble(cls=sample(c('one', 'two', 'three'), 100, 
                      replace=TRUE, prob=c(0.5, 0.3, 0.2)),
           conf=sample(c('yes', 'no'), 100, 
                       replace=TRUE, prob=c(0.6, 0.4))) %>% 
  mutate(cls = factor(cls, levels=c('one', 'two', 'three', 'Total')))

# Tabulate by cls and conf
d2 = d %>% group_by(cls, conf) %>% 
  summarise(n=n()) %>% # Total per cls x conf
  spread(conf, n)  # Spread to one row per cls

# Add a total row. Do this before calculating percents so we don't total
# the within-row percents. This is really ugly
d2 = d2 %>% bind_rows(as_data_frame(t(c(cls=NA, colSums(d2[,-1])))))
d2$cls[nrow(d2)] = 'Total'

d2 = d2 %>% mutate(total=no+yes, # Make percents and row totals
         no_pct=percent(no/total, 0),
         yes_pct=percent(yes/total, 0)) %>% 
  select(Class=cls, Yes=yes, `Yes pct`=yes_pct, # Reorder and rename columns
         No=no, `No pct`=no_pct, Total=total)

formattable(d2) # Yay! Nice table.
knitr::kable(d2, format='rst') # For pasting above
like image 953
Kent Johnson Avatar asked Jan 25 '17 22:01

Kent Johnson


1 Answers

The code below is still somewhat involved (perhaps it can be simplified further), but it seems more intuitive to me and takes advantage of tidyverse functionality. I've included comments to explain what the code is doing at each stage.

# Tabulate by cls and conf
d2 = d %>% group_by(cls, conf) %>%             
  tally %>% 
  # Add a row with column totals (group only by conf, instead of by cls and conf)
  bind_rows(d %>% group_by(conf) %>%           
      tally %>%
      mutate(cls="Total")) %>%
  # Add percent column by taking advantage of long format and pre-existing grouping
  mutate(pct = round(n/sum(n)*100)) %>%
  # Now spread to wide format       
  gather(key, value, n, pct, -cls, -conf) %>%  
  unite(conf_key, conf, key) %>%
  spread(conf_key, value) %>%
  # Add percent symbols
  mutate_at(vars(matches("pct")), funs(paste0(.,"%"))) %>%
  # Get cls values in the right order and add row totals
  ungroup %>%
  mutate(cls = factor(cls, levels=c("one","two","three","Total")),
         Total = no_n + yes_n) %>%
  arrange(cls) %>%
  select(Class=cls, Yes=yes_n, `Yes pct`=yes_pct, No=no_n, `No pct`=no_pct, Total)
   Class   Yes `Yes pct`    No `No pct` Total
1    one    32       73%    12      27%    44
2    two    21       52%    19      48%    40
3  three    10       62%     6      38%    16
4  Total    63       63%    37      37%   100
like image 116
eipi10 Avatar answered Nov 14 '22 21:11

eipi10