I often need to create a formatted table with counts, percents and marginal totals. For example I might have data with three classes and two categories. I want to make a table with a row for each class, containing within-class counts and percents for each category and the total count for the category. Finally, a total row at the bottom totals categories, shows categories as percent of total, and an overall total.
The code is always ugly :-) and I would love to find a better way. Here is a very simple example; often it is much more complex than this.
===== === ======= === ====== =====
Class Yes Yes pct No No pct Total
===== === ======= === ====== =====
one 35 65% 19 35% 54
two 21 70% 9 30% 30
three 9 56% 7 44% 16
Total 65 65% 35 35% 100
===== === ======= === ====== =====
I know about addmargins
(doesn't work on data.frame), prop.table
(gives a separate table of proportions), descr::CrossTable
(puts values in cells, not spread across the row). Any suggestions for how to clean this up are welcome.
Here is the code to create the above table:
library(formattable) # For nice percents
library(tidyverse)
# Make up some data. Three classes with two categories within each class
# Order of cls is important so it is a factor
d = tibble(cls=sample(c('one', 'two', 'three'), 100,
replace=TRUE, prob=c(0.5, 0.3, 0.2)),
conf=sample(c('yes', 'no'), 100,
replace=TRUE, prob=c(0.6, 0.4))) %>%
mutate(cls = factor(cls, levels=c('one', 'two', 'three', 'Total')))
# Tabulate by cls and conf
d2 = d %>% group_by(cls, conf) %>%
summarise(n=n()) %>% # Total per cls x conf
spread(conf, n) # Spread to one row per cls
# Add a total row. Do this before calculating percents so we don't total
# the within-row percents. This is really ugly
d2 = d2 %>% bind_rows(as_data_frame(t(c(cls=NA, colSums(d2[,-1])))))
d2$cls[nrow(d2)] = 'Total'
d2 = d2 %>% mutate(total=no+yes, # Make percents and row totals
no_pct=percent(no/total, 0),
yes_pct=percent(yes/total, 0)) %>%
select(Class=cls, Yes=yes, `Yes pct`=yes_pct, # Reorder and rename columns
No=no, `No pct`=no_pct, Total=total)
formattable(d2) # Yay! Nice table.
knitr::kable(d2, format='rst') # For pasting above
The code below is still somewhat involved (perhaps it can be simplified further), but it seems more intuitive to me and takes advantage of tidyverse
functionality. I've included comments to explain what the code is doing at each stage.
# Tabulate by cls and conf
d2 = d %>% group_by(cls, conf) %>%
tally %>%
# Add a row with column totals (group only by conf, instead of by cls and conf)
bind_rows(d %>% group_by(conf) %>%
tally %>%
mutate(cls="Total")) %>%
# Add percent column by taking advantage of long format and pre-existing grouping
mutate(pct = round(n/sum(n)*100)) %>%
# Now spread to wide format
gather(key, value, n, pct, -cls, -conf) %>%
unite(conf_key, conf, key) %>%
spread(conf_key, value) %>%
# Add percent symbols
mutate_at(vars(matches("pct")), funs(paste0(.,"%"))) %>%
# Get cls values in the right order and add row totals
ungroup %>%
mutate(cls = factor(cls, levels=c("one","two","three","Total")),
Total = no_n + yes_n) %>%
arrange(cls) %>%
select(Class=cls, Yes=yes_n, `Yes pct`=yes_pct, No=no_n, `No pct`=no_pct, Total)
Class Yes `Yes pct` No `No pct` Total 1 one 32 73% 12 27% 44 2 two 21 52% 19 48% 40 3 three 10 62% 6 38% 16 4 Total 63 63% 37 37% 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With