Using dplyr
I'm generating a simple summary table for two categories:
# Data
data("mtcars")
# Lib
require(dplyr)
# Summary
mt_sum <- mtcars %>%
group_by(am, gear) %>%
summarise(n = n()) %>%
spread(key = am, value = n)
Which produces the desired results:
Source: local data frame [3 x 3]
gear 0 1
(dbl) (int) (int)
1 3 15 NA
2 4 4 8
3 5 NA 5
To the generated table I would like to add a set of columns that would have row percentages instead of the presently available totals.
I would like for my table to look like that:
gear 0 1 0per 1per
1 3 15 NA 100%
2 4 4 8 33% 67%
3 5 NA 5 100%
I tried to achieve the following by adding the code:
mt_sum <- mtcars %>%
group_by(am, gear) %>%
summarise(n = n()) %>%
spread(key = am, value = n) %>%
mutate_each(funs(./rowSums(.)))
but it returns the following error:
Error: 'x' must be an array of at least two dimensions
Hence my question: how can I add extra columns with row percentage values in dplyr
?
NAs
CrossTable
in gmodels
but I would like to stay in dplyr
as I want to keep as many transformations as possible in one placeI think this is what you need:
# Data
data("mtcars")
# Lib
require(dplyr)
require(tidyr)
require(scales) #for percent
# Summary
mtcars %>%
group_by(am, gear) %>%
summarise(n = n()) %>%
spread(key = am, value = n) %>%
#you need rowwise because this is a rowwise operation
rowwise %>%
#I find do to be the best function for ad-hoc things that
#have no specific dplyr function
#I use do below to calculate the numeric percentages
do(data.frame(.,
per0 = .$`0` / sum(.$`0`, .$`1`, na.rm=TRUE),
per1 = .$`1` / sum(.$`0`, .$`1`, na.rm=TRUE))) %>%
#mutate here is used to convert NAs to blank and numbers to percentages
mutate(per0 = ifelse(is.na(per0), '', percent(per0)),
per1 = ifelse(is.na(per1), '', percent(per1)))
Output:
Source: local data frame [3 x 5]
Groups: <by row>
gear X0 X1 per0 per1
(dbl) (int) (int) (chr) (chr)
1 3 15 NA 100%
2 4 4 8 33.3% 66.7%
3 5 NA 5 100%
Here is a way to do it with reshaping:
library(dplyr) library(tidyr)
mtcars %>%
count(gear, am) %>%
mutate(percent = n / sum(n)) %>%
gather(variable, value,
n, percent) %>%
unite("new_variable", am, variable) %>%
spread(new_variable, value)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With