I would like to have a tidyverse solution for the following problem. In my dataset, I have data on various factor levels. I would like to create a new factor level "Total" that is the sum of all values Y at existing factor levels of X. This can be done, for example, with:
mutate(Data, X = fct_collapse(X, Total = c("A", "B", "C", "D"))) %>%
group_by(X) %>%
summarize(Y = sum(Y))
However, this also necessarily overwrites the original factor levels. I would have to combine the original dataset with the new collapsed dataset in an additional step.
One solution I have used in the past to retain the original levels is to bring data in the wide format and proceed with rowwise()
and mutate()
to create a new variable with the "Total" and then reshape back to long.
spread(Data, key = X, value = Y) %>%
rowwise() %>%
mutate(Total = sum(A, B, C, D)) %>%
gather(1:5, key = "X", value = "Y")
However, I am very unhappy with this solution since using rowwise()
is not considered good practice. It would be great if you could point me to an available alternative solution how to combine data under different factor levels while retaining original levels.
Minimal reproducible example:
Data<-data.frame(
X = factor(c("A", "B", "C", "D")),
Y = c(1000, 2000, 3000, 4000))
Expected result:
# A tibble: 5 x 2
X Y
<chr> <dbl>
1 A 1000
2 B 2000
3 C 3000
4 D 4000
5 Total 10000
Using janitor library, this would be straightforward.
Data %>% janitor::adorn_totals("row") %>% mutate(X=factor(X))
# X Y
# A 1000
# B 2000
# C 3000
# D 4000
# Total 10000
Looking at the output structure:
str(output)
# 'data.frame': 5 obs. of 2 variables:
# $ X: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
# $ Y: num 1000 2000 3000 4000 10000
Using the suggestion in @M--'s first version of his comment to the question, now edited, I have added bind_rows
.
I have also changed the input dataset a bit. Following the OP's and @camille's comment, this dataset has a factor level "Z"
but keeps the original order and adds level "Total"
at the end.
Data <- data.frame(
X = factor(c("A", "B", "C", "Z")),
Y = c(1000, 2000, 3000, 4000))
Data %>%
mutate(lvl = levels(X),
X = fct_collapse(X, Total = c("A", "B", "C", "Z")),
X = as.character(X)) %>%
bind_rows(mutate(Data, X = as.character(X)), .) %>%
mutate(X = factor(X, levels = c(lvl, "Total"))) %>%
group_by(X) %>%
summarize(Y = sum(Y)) -> d
d
## A tibble: 5 x 2
# X Y
# <fct> <dbl>
#1 A 1000
#2 B 2000
#3 C 3000
#4 Z 4000
#5 Total 10000
Check the output factor levels.
levels(d$X)
#[1] "A" "B" "C" "Z" "Total"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With