Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshaping data from long to wide with both sums and counts

Tags:

r

I am trying to reshape data from long to wide format in R. I would like to get both counts of occurrences of a type variable by ID and sums of the values of a second variable (val) by ID and type as in the example below.

I was able to find answers for reshaping with either counts or sums but not for both simultaneously.

This is the original example data:

> df <- data.frame(id = c(1, 1, 1, 2, 2, 2),
+                  type = c("A", "A", "B", "A", "B", "C"),
+                  val = c(0, 1, 2, 0, 0, 4))
> df
  id type val
1  1    A   0
2  1    A   1
3  1    B   2
4  2    A   0
5  2    B   0
6  2    C   4

The output I would like to obtain is the following:

  id A.count B.count C.count A.sum B.sum C.sum
1  1       2       1       0     1     2     0
2  2       1       1       1     0     0     4

where the count columns display the number of occurrences of type A, B and C and the sum columns the sum of the values by type.

To achieve the counts I can, as suggested in this answer, use reshape2::dcast with the default aggregation function, length:

> require(reshape2)
> df.c <- dcast(df, id ~ type, value.var = "type", fun.aggregate = length)
> df.c
  id A B C
1  1 2 1 0
2  2 1 1 1

Similarly, as suggested in this answer, I can also perform the reshape with the sums as output, this time using the sum aggregation function in dcast:

> df.s <- dcast(df, id ~ type, value.var = "val", fun.aggregate = sum)
> df.s
  id A B C
1  1 1 2 0
2  2 0 0 4

I could merge the two:

> merge(x = df.c, y = df.s, by = "id", all = TRUE)
  id A.x B.x C.x A.y B.y C.y
1  1   2   1   0   1   2   0
2  2   1   1   1   0   0   4

but is there a way of doing it all in one go (not necessarily with dcast or reshape2)?

like image 404
Lino Ferreira Avatar asked Jul 21 '18 12:07

Lino Ferreira


2 Answers

From data.table v1.9.6, it is possible to cast multiple value.var columns and also cast by providing multiple fun.aggregate functions. See below:

library(data.table)

df <- data.table(df)
dcast(df, id ~ type, fun = list(length, sum), value.var = c("val"))
   id val_length_A val_length_B val_length_C val_sum_A val_sum_B val_sum_C
1:  1            2            1            0         1         2         0
2:  2            1            1            1         0         0         4
like image 199
phiver Avatar answered Sep 20 '22 13:09

phiver


Here is an approach with tidyverse

library(tidyverse)
df %>% 
  group_by(id, type) %>%
  summarise(count = n(), Sum = sum(val)) %>%
  gather(key, val, count:Sum) %>%
  unite(typen, type, key, sep=".") %>%
  spread(typen, val, fill = 0)
like image 20
akrun Avatar answered Sep 18 '22 13:09

akrun