Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining data under different factor levels while retaining original levels

I would like to have a tidyverse solution for the following problem. In my dataset, I have data on various factor levels. I would like to create a new factor level "Total" that is the sum of all values Y at existing factor levels of X. This can be done, for example, with:

mutate(Data, X = fct_collapse(X, Total = c("A", "B", "C", "D"))) %>%
  group_by(X) %>% 
  summarize(Y = sum(Y))

However, this also necessarily overwrites the original factor levels. I would have to combine the original dataset with the new collapsed dataset in an additional step.

One solution I have used in the past to retain the original levels is to bring data in the wide format and proceed with rowwise() and mutate() to create a new variable with the "Total" and then reshape back to long.

spread(Data, key = X, value = Y) %>%
  rowwise() %>%
  mutate(Total = sum(A, B, C, D)) %>%
  gather(1:5, key = "X", value = "Y")

However, I am very unhappy with this solution since using rowwise() is not considered good practice. It would be great if you could point me to an available alternative solution how to combine data under different factor levels while retaining original levels.

Minimal reproducible example:

Data<-data.frame(
X = factor(c("A", "B", "C", "D")),
Y = c(1000, 2000, 3000, 4000))

Expected result:

# A tibble: 5 x 2
  X         Y
  <chr> <dbl>
1 A      1000
2 B      2000
3 C      3000
4 D      4000
5 Total 10000
like image 288
miwin Avatar asked Dec 23 '22 01:12

miwin


2 Answers

Using janitor library, this would be straightforward.

Data %>% janitor::adorn_totals("row") %>% mutate(X=factor(X))

  # X     Y
  # A     1000
  # B     2000
  # C     3000
  # D     4000
  # Total 10000

Looking at the output structure:

str(output)

# 'data.frame': 5 obs. of  2 variables:
#  $ X: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
#  $ Y: num  1000 2000 3000 4000 10000
like image 192
M-- Avatar answered Jan 18 '23 23:01

M--


Using the suggestion in @M--'s first version of his comment to the question, now edited, I have added bind_rows.
I have also changed the input dataset a bit. Following the OP's and @camille's comment, this dataset has a factor level "Z" but keeps the original order and adds level "Total" at the end.

Data <- data.frame(
  X = factor(c("A", "B", "C", "Z")),
  Y = c(1000, 2000, 3000, 4000))

Data %>%
  mutate(lvl = levels(X),
         X = fct_collapse(X, Total = c("A", "B", "C", "Z")),
         X = as.character(X)) %>%
  bind_rows(mutate(Data, X = as.character(X)), .) %>%
  mutate(X = factor(X, levels = c(lvl, "Total"))) %>%
  group_by(X) %>% 
  summarize(Y = sum(Y)) -> d

d
## A tibble: 5 x 2
#  X         Y
#  <fct> <dbl>
#1 A      1000
#2 B      2000
#3 C      3000
#4 Z      4000
#5 Total 10000

Check the output factor levels.

levels(d$X)
#[1] "A"     "B"     "C"     "Z"     "Total"
like image 31
Rui Barradas Avatar answered Jan 18 '23 22:01

Rui Barradas