Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a list of all values of a variable grouped by another variable in R

Tags:

r

dplyr

I have a data frame that contains two variables, like this:

df <- data.frame(group=c(1,1,1,2,2,3,3,4),
                  type=c("a","b","a", "b", "c", "c","b","a"))

> df
   group type
1      1    a
2      1    b
3      1    a
4      2    b
5      2    c
6      3    c
7      3    b
8      4    a

I want to produce a table showing for each group the combination of types it has in the data frame as one variable e.g.

  group alltypes
1     1     a, b
2     2     b, c
3     3     b, c
4     4        a

The output would always list the types in the same order (e.g. groups 2 and 3 get the same result) and there would be no repetition (e.g. group 1 is not "a, b, a").

I tried doing this using dplyr and summarize, but I can't work out how to get it to meet these two conditions - the code I tried was:

> df %>%
+   group_by(group) %>%
+   summarise(
+     alltypes = paste(type, collapse=", ")
+   )
# A tibble: 4 × 2
  group alltypes
  <dbl>    <chr>
1     1  a, b, a
2     2     b, c
3     3     c, b
4     4        a

I also tried turning type into a set of individual counts, but not sure if that's actually useful:

> df %>%
+   group_by(group, type) %>%
+   tally %>%
+   spread(type, n, fill=0)
Source: local data frame [4 x 4]
Groups: group [4]

  group     a     b     c
* <dbl> <dbl> <dbl> <dbl>
1     1     2     1     0
2     2     0     1     1
3     3     0     1     1
4     4     1     0     0

Any suggestions would be greatly appreciated.

like image 411
Shedsley Avatar asked Aug 03 '17 15:08

Shedsley


People also ask

How do I group categorical variables in R?

When working with categorical variables, you may use the group_by() method to divide the data into subgroups based on the variable's distinct categories. You can group by a single variable or by giving in multiple variable names to group by several variables.

How do I list all variables in R?

You can use ls() to list all variables that are created in the environment. Use ls() to display all variables.

What does group_by do in R?

Groupby Function in R – group_by is used to group the dataframe in R. Dplyr package in R is provided with group_by() function which groups the dataframe by multiple columns with mean, sum and other functions like count, maximum and minimum.


1 Answers

I think you were very close. You could call the sort and unique functions to make sure your result adheres to your conditions as follows:

df %>% group_by(group) %>% 
summarize(type = paste(sort(unique(type)),collapse=", "))

returns:

# A tibble: 4 x 2
  group  type
  <int> <chr>
1     1  a, b
2     2  b, c
3     3  b, c
4     4     a
like image 94
Florian Avatar answered Oct 21 '22 06:10

Florian