I have a long form dataframe that have multiple entries for same date and person.
jj <- data.frame(month=rep(1:3,4),
student=rep(c("Amy", "Bob"), each=6),
A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5),
B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5))
I want to convert it to wide form and make it like this:
month Amy.A Bob.A Amy.B Bob.B
1
2
3
1
2
3
1
2
3
1
2
3
My question is very similar to this. I have used the given code in the answer :
kk <- jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
spread(temp, value)
but it gives following error:
Error: Duplicate identifiers for rows (1, 4), (2, 5), (3, 6), (13, 16), (14, 17), (15, 18), (7, 10), (8, 11), (9, 12), (19, 22), (20, 23), (21, 24)
Thanks in advance. Note: I don't want to delete multiple entries.
Your answer was missing mutate id! Here is the solution using dplyr packge only.
jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
group_by(temp) %>%
mutate(id=1:n()) %>%
spread(temp, value)
# A tibble: 6 x 6
# month id Amy_A Amy_B Bob_A Bob_B
# * <int> <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 9 6 3 5
# 2 1 4 8 5 5 3
# 3 2 2 7 7 2 4
# 4 2 5 6 6 6 1
# 5 3 3 6 8 1 6
# 6 3 6 9 7 5 5
The issue is the two columns for both A
and B
. If we can make that one value column, we can spread the data as you would like. Take a look at the output for jj_melt
when you use the code below.
library(reshape2)
jj_melt <- melt(jj, id=c("month", "student"))
jj_spread <- dcast(jj_melt, month ~ student + variable, value.var="value", fun=sum)
# month Amy_A Amy_B Bob_A Bob_B
# 1 1 17 11 8 8
# 2 2 13 13 8 5
# 3 3 15 15 6 11
I won't mark this as a duplicate since the other question did not summarize by sum
, but the data.table
answer could help with one additional argument, fun=sum
:
library(data.table)
dcast(setDT(jj), month ~ student, value.var=c("A", "B"), fun=sum)
# month A_sum_Amy A_sum_Bob B_sum_Amy B_sum_Bob
# 1: 1 17 8 11 8
# 2: 2 13 8 13 5
# 3: 3 15 6 15 11
If you would like to use the tidyr
solution, combine it with dcast
to summarize by sum
.
as.data.frame(jj)
library(tidyr)
jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
dcast(month ~ temp, fun=sum)
# month Amy_A Amy_B Bob_A Bob_B
# 1 1 17 11 8 8
# 2 2 13 13 8 5
# 3 3 15 15 6 11
Edit
Based on your new requirements, I have added an activity column.
library(dplyr)
jj %>% group_by(month, student) %>%
mutate(id=1:n()) %>%
melt(id=c("month", "id", "student")) %>%
dcast(... ~ student + variable, value.var="value")
# month id Amy_A Amy_B Bob_A Bob_B
# 1 1 1 9 6 3 5
# 2 1 2 8 5 5 3
# 3 2 1 7 7 2 4
# 4 2 2 6 6 6 1
# 5 3 1 6 8 1 6
# 6 3 2 9 7 5 5
The other solutions can also be used. Here I added an optional expression to arrange the final output by activity number:
library(tidyr)
jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
group_by(temp) %>%
mutate(id=1:n()) %>%
dcast(... ~ temp) %>%
arrange(id)
# month id Amy_A Amy_B Bob_A Bob_B
# 1 1 1 9 6 3 5
# 2 2 2 7 7 2 4
# 3 3 3 6 8 1 6
# 4 1 4 8 5 5 3
# 5 2 5 6 6 6 1
# 6 3 6 9 7 5 5
The data.table
syntax is compact because it allows for multiple value.var
columns and will take care of the spread for us. We can then skip the melt -> cast
process.
library(data.table)
setDT(jj)[, activityID := rowid(student)]
dcast(jj, ... ~ student, value.var=c("A", "B"))
# month activityID A_Amy A_Bob B_Amy B_Bob
# 1: 1 1 9 3 6 5
# 2: 1 4 8 5 5 3
# 3: 2 2 7 2 7 4
# 4: 2 5 6 6 6 1
# 5: 3 3 6 1 8 6
# 6: 3 6 9 5 7 5
Since tidyr 1.0.0 pivot_wider
is the recommended replacement of spread
and you could do the following :
jj <- data.frame(month=rep(1:3,4),
student=rep(c("Amy", "Bob"), each=6),
A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5),
B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5))
library(tidyr)
pivot_wider(
jj,
names_from = "student",
values_from = c("A","B"),
names_sep = ".",
values_fn = list(A= list, B= list)) %>%
unchop(everything())
#> # A tibble: 6 x 5
#> month A.Amy A.Bob B.Amy B.Bob
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 9 3 6 5
#> 2 1 8 5 5 3
#> 3 2 7 2 7 4
#> 4 2 6 6 6 1
#> 5 3 6 1 8 6
#> 6 3 9 5 7 5
Created on 2019-09-14 by the reprex package (v0.3.0)
The twist in this problem is that month is not unique by student, to solve this :
values_fn = list(A= list, B= list))
puts the multiple values in a listunchop(everything())
unnest the lists vertically, you can use unnest
as well hereIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With