Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to name a list of a group_split in dplyr when grouped by more than one column

Tags:

r

dplyr

I am using group_split in dplyr and I am struggling to name the list after I have split by more than one column.

I know how to do this when we group by one column here but I am not sure how to do this when splitting by two columns

I can't share the data but if using the iris dataset, it would be similar to this (in my case both columns are factors)

iris %>%
group_split(Species, Petal.Width)
like image 385
Scott Avatar asked Jul 30 '19 15:07

Scott


People also ask

Can you group by multiple columns in dplyr?

By using group_by() function from dplyr package we can perform group by on multiple columns or variables (two or more columns) and summarise on multiple columns for aggregations.

Can you group by multiple columns in R?

Grouping can be also done using multiple columns belonging to the data frame for this just the names of the columns have to be passed to the function.

How do you split data by category in R?

Split vector and data frame in R, splitting data into groups depending on factor levels can be done with R's split() function. Split() is a built-in R function that divides a vector or data frame into groups according to the function's parameters.


1 Answers

Use dplyr::group_keys() to get the grouping variables.

library(dplyr)
library(stringr)
# make grouped data frame
iris_group <- iris %>%
    group_by(Species, Petal.Width)

# get group keys
group_name_df <- group_keys(iris_group) %>%
    mutate(group_name = str_c(as.character(Species),"-",Petal.Width))

# get name for each group
group_name <- group_name_df$group_name

# assign name to each split table
df_list <- group_split(iris_group) %>%
    setNames(group_name)

> group_name_df
# A tibble: 27 x 3
   Species    Petal.Width group_name    
   <fct>            <dbl> <chr>         
 1 setosa             0.1 setosa-0.1    
 2 setosa             0.2 setosa-0.2    
 3 setosa             0.3 setosa-0.3    
 4 setosa             0.4 setosa-0.4    
 5 setosa             0.5 setosa-0.5    
 6 setosa             0.6 setosa-0.6    
 7 versicolor         1   versicolor-1  
 8 versicolor         1.1 versicolor-1.1
 9 versicolor         1.2 versicolor-1.2
10 versicolor         1.3 versicolor-1.3
# ... with 17 more rows
> df_list 
$`setosa-0.1`
# A tibble: 5 x 5
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl> <fct>  
1          4.9         3.1          1.5         0.1 setosa 
2          4.8         3            1.4         0.1 setosa 
3          4.3         3            1.1         0.1 setosa 
4          5.2         4.1          1.5         0.1 setosa 
5          4.9         3.6          1.4         0.1 setosa 

$`setosa-0.2`
# A tibble: 29 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
.
.
.
like image 112
yusuzech Avatar answered Nov 11 '22 16:11

yusuzech