Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tidyr::expand() for a single column across groups

Tags:

r

dplyr

tidyr

tidyr::expand() returns all possible combinations of values from multiple columns. I'm looking for a slightly different behavior, where all the values are in a single column and the combinations are to be taken across groups.

For example, let the data be defined as follows:

library( tidyverse )
X <- bind_rows( data_frame(Group = "Group1", Value = LETTERS[1:3]),
                data_frame(Group = "Group2", Value = letters[4:5]) )

We want all combinations of values from Group1 with values from Group2. My current clunky solution is to separate the values across multiple columns

Y <- X %>% group_by(Group) %>% do(vals = .$Value) %>% spread(Group, vals)
# # A tibble: 1 x 2
#   Group1    Group2   
#   <list>    <list>   
# 1 <chr [3]> <chr [2]>

followed by a double unnest operation

Y %>% unnest( .preserve = Group2 ) %>% unnest
# # A tibble: 6 x 2
#   Group1 Group2
#   <chr>  <chr> 
# 1 A      d     
# 2 A      e     
# 3 B      d     
# 4 B      e     
# 5 C      d     
# 6 C      e     

This is the desired output, but as you can imagine, this solution doesn't generalize well: as the number of groups increases, so does the number of unnest operations that we have to perform.

Is there a more elegant solution?

like image 635
Artem Sokolov Avatar asked May 25 '18 19:05

Artem Sokolov


People also ask

What is spread () in R?

The spread() function can be used to create four additional columns from the stat column's four distinct values. library(tidyr) library(tidyr)

How do I spread a column in R?

To use spread() , pass it the name of a data frame, then the name of the key column in the data frame, and then the name of the value column. Pass the column names as they are; do not use quotes. To tidy table2 , you would pass spread() the key column and then the value column.

What are the functions of Tidyr?

tidyr provides three main functions for tidying your messy data: gather() , separate() and spread() . Sometimes two variables are clumped together in one column. separate() allows you to tease them apart ( extract() works similarly but uses regexp groups instead of a splitting pattern or position).

What does expand grid do in R?

expand. grid() function in R Language is used to create a data frame with all the values that can be formed with the combinations of all the vectors or factors passed to the function as argument.


1 Answers

Because OP seems happy to use base, I upgrade my comment to an answer:

expand.grid(split(X$Value, X$Group))
#   Group1 Group2
# 1      A      d
# 2      B      d
# 3      C      d
# 4      A      e
# 5      B      e
# 6      C      e

As noted by OP, expand.grid converts character vectors to factors. To prevent that, use stringsAsFactors = FALSE.

The tidyverse equivalent is purrr::cross_df, which doesn't coerce to factor:

cross_df(split(X$Value, X$Group))
# A tibble: 6 x 2
# Group1 Group2
# <chr>  <chr> 
# 1 A      d     
# 2 B      d     
# 3 C      d     
# 4 A      e     
# 5 B      e     
# 6 C      e
like image 168
Henrik Avatar answered Sep 22 '22 04:09

Henrik