Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tidyr spread does not aggregate data

Tags:

r

tidyr

I have data of the following:

    > data <- data.frame(unique=1:9, grouping=rep(c('a', 'b', 'c'), each=3), value=sample(1:30, 9))
    > data
      unique grouping value
    1      1        a    15
    2      2        a    21
    3      3        a    26
    4      4        b     8
    5      5        b     6
    6      6        b     4
    7      7        c    17
    8      8        c     1
    9      9        c     3

I would like to create a table that looks like this:

       a        b    c
1      15       8    17
2      21       6    1
3      26       6    3

I am using tidyr::spread and not getting the correct result:

> data %>% spread(grouping, value)
  unique  a  b  c
1      1 15 NA NA
2      2 21 NA NA
3      3 26 NA NA
4      4 NA  8 NA
5      5 NA  6 NA
6      6 NA  4 NA
7      7 NA NA 17
8      8 NA NA  1
9      9 NA NA  3

Or

> data %>% select(grouping, value) %>% spread(grouping, value)
Error: Duplicate identifiers for rows (1, 2, 3), (4, 5, 6), (7, 8, 9)

Is there a way to do this also when one group (c) has a different length than the others?

like image 415
Josh Avatar asked Dec 19 '22 19:12

Josh


1 Answers

We need to create a sequence column to avoid the duplicate identifiers row Error.

library(tidyr)
library(dplyr)
data %>% 
    group_by(grouping) %>% 
    mutate(id = row_number()) %>% 
    select(-unique) %>%
    spread(grouping, value) %>%
    select(-id)
#     a     b     c
#  (int) (int) (int)
#1    15     8    17
#2    21     6     1
#3    26     4     3
like image 64
akrun Avatar answered Jan 06 '23 05:01

akrun