<code>tidyr::expand()</code> returns all possible combinations of values from multiple columns. I'm looking for a slightly different behavior, where all the values are in a single column and the combinations are to be taken across groups. For example, let the data be defined as follows: <pre class="prettyprint"><code>library( tidyverse ) X <- bind_rows( data_frame(Group = "Group1", Value = LETTERS[1:3]), data_frame(Group = "Group2", Value = letters[4:5]) ) </code></pre> We want all combinations of values from <code>Group1</code> with values from <code>Group2</code>. My current clunky solution is to separate the values across multiple columns <pre class="prettyprint"><code>Y <- X %>% group_by(Group) %>% do(vals = .$Value) %>% spread(Group, vals) # # A tibble: 1 x 2 # Group1 Group2 # <list> <list> # 1 <chr [3]> <chr [2]> </code></pre> followed by a double <code>unnest</code> operation <pre class="prettyprint"><code>Y %>% unnest( .preserve = Group2 ) %>% unnest # # A tibble: 6 x 2 # Group1 Group2 # <chr> <chr> # 1 A d # 2 A e # 3 B d # 4 B e # 5 C d # 6 C e </code></pre> This is the desired output, but as you can imagine, this solution doesn't generalize well: as the number of groups increases, so does the number of <code>unnest</code> operations that we have to perform. Is there a more elegant solution?

Because OP seems happy to use <code>base</code>, I upgrade my comment to an answer: <pre class="prettyprint"><code>expand.grid(split(X$Value, X$Group)) # Group1 Group2 # 1 A d # 2 B d # 3 C d # 4 A e # 5 B e # 6 C e </code></pre> As noted by OP, <code>expand.grid</code> converts character vectors to factors. To prevent that, use <code>stringsAsFactors = FALSE</code>. The <code>tidyverse</code> equivalent is <code>purrr::cross_df</code>, which doesn't coerce to factor: <pre class="prettyprint"><code>cross_df(split(X$Value, X$Group)) # A tibble: 6 x 2 # Group1 Group2 # <chr> <chr> # 1 A d # 2 B d # 3 C d # 4 A e # 5 B e # 6 C e </code></pre>

tidyr::expand() for a single column across groups

Tags:

r

dplyr

tidyr

tidyr::expand() returns all possible combinations of values from multiple columns. I'm looking for a slightly different behavior, where all the values are in a single column and the combinations are to be taken across groups.

For example, let the data be defined as follows:

library( tidyverse )
X <- bind_rows( data_frame(Group = "Group1", Value = LETTERS[1:3]),
                data_frame(Group = "Group2", Value = letters[4:5]) )

We want all combinations of values from Group1 with values from Group2. My current clunky solution is to separate the values across multiple columns

Y <- X %>% group_by(Group) %>% do(vals = .$Value) %>% spread(Group, vals)
# # A tibble: 1 x 2
#   Group1    Group2   
#   <list>    <list>   
# 1 <chr [3]> <chr [2]>

followed by a double unnest operation

Y %>% unnest( .preserve = Group2 ) %>% unnest
# # A tibble: 6 x 2
#   Group1 Group2
#   <chr>  <chr> 
# 1 A      d     
# 2 A      e     
# 3 B      d     
# 4 B      e     
# 5 C      d     
# 6 C      e

This is the desired output, but as you can imagine, this solution doesn't generalize well: as the number of groups increases, so does the number of unnest operations that we have to perform.

Is there a more elegant solution?

635

asked May 25 '18 19:05

Artem Sokolov

1 Answers

Because OP seems happy to use base, I upgrade my comment to an answer:

expand.grid(split(X$Value, X$Group))
#   Group1 Group2
# 1      A      d
# 2      B      d
# 3      C      d
# 4      A      e
# 5      B      e
# 6      C      e

As noted by OP, expand.grid converts character vectors to factors. To prevent that, use stringsAsFactors = FALSE.

The tidyverse equivalent is purrr::cross_df, which doesn't coerce to factor:

cross_df(split(X$Value, X$Group))
# A tibble: 6 x 2
# Group1 Group2
# <chr>  <chr> 
# 1 A      d     
# 2 B      d     
# 3 C      d     
# 4 A      e     
# 5 B      e     
# 6 C      e

168

answered Sep 22 '22 04:09

Henrik

Related questions
                            
                                rbind a list of data frames with different columns [duplicate]
                            
                                Wildcards for filter function in dplyr
                            
                                How to refer to variable instead of column with dplyr
                            
                                Logarithmic scale plot in R
                            
                                Add visitor count and analytics to R blogdown > netlify housted website
                            
                                grepl across multiple, specified columns
                            
                                Fill in sequential values in a dataframe
                            
                                Condition in ifelse: Value in multiple columns/variables
                            
                                Change the color of a ggplot geom a posteriori (after having specified another color)
                            
                                Extracting Information from Multi-Level Nested Lists
                            
                                Create 'dummy variables' by spreading duplicate rows into columns in R
                            
                                Using Likert Package in R for analyzing real survey data
                            
                                Two conditions for split a column
                            
                                How can I put multiple plots side-by-side in a tab panel with other outputs present, shiny r?
                            
                                Replace multiple values in a list in R
                            
                                Inner-Joining two sf objects by non sf column
                            
                                unable to set xlim and ylim using min() and max() in ggplot
                            
                                Retain list names after applying map
                            
                                From tibble to txt or excel file in R
                            
                                dplyr mutate a variable by comparing a variable and vectors of different sizes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With