I'm trying to create separate <code>data.frame</code> objects based on levels of a factor. So if I have: <pre class="prettyprint"><code>df <- data.frame( x=rnorm(25), y=rnorm(25), g=rep(factor(LETTERS[1:5]), 5) ) </code></pre> How can I split <code>df</code> into separate <code>data.frame</code>s for each level of <code>g</code> containing the corresponding <code>x</code> and <code>y</code> values? I can get most of the way there using <code>split(df, df$g)</code>, but I'd like the each level of the factor to have its own <code>data.frame</code>. What's the best way to do this?

I think that <code>split</code> does exactly what you want. Notice that X is a list of data frames, as seen by <code>str</code>: <pre class="prettyprint"><code>X <- split(df, df$g) str(X) </code></pre> If you want individual object with the group g names you could assign the elements of X from <code>split</code> to objects of those names, though this seems like extra work when you can just index the data frames from the list <code>split</code> creates. <pre class="prettyprint"><code>#I used lapply just to drop the third column g which is no longer needed. Y <- lapply(seq_along(X), function(x) as.data.frame(X[[x]])[, 1:2]) #Assign the dataframes in the list Y to individual objects A <- Y[[1]] B <- Y[[2]] C <- Y[[3]] D <- Y[[4]] E <- Y[[5]] #Or use lapply with assign to assign each piece to an object all at once lapply(seq_along(Y), function(x) { assign(c("A", "B", "C", "D", "E")[x], Y[[x]], envir=.GlobalEnv) } ) </code></pre> Edit Or even better than using <code>lapply</code> to assign to the global environment use <code>list2env</code>: <pre class="prettyprint"><code>names(Y) <- c("A", "B", "C", "D", "E") list2env(Y, envir = .GlobalEnv) A </code></pre>

Since <code>dplyr 0.8.0</code> , we can also use <code>group_split</code> which has similar behavior as <code>base::split</code> <pre class="prettyprint"><code>library(dplyr) df %>% group_split(g) #[[1]] # A tibble: 5 x 3 # x y g # <dbl> <dbl> <fct> #1 -1.21 -1.45 A #2 0.506 1.10 A #3 -0.477 -1.17 A #4 -0.110 1.45 A #5 0.134 -0.969 A #[[2]] # A tibble: 5 x 3 # x y g # <dbl> <dbl> <fct> #1 0.277 0.575 B #2 -0.575 -0.476 B #3 -0.998 -2.18 B #4 -0.511 -1.07 B #5 -0.491 -1.11 B #.... </code></pre> It also comes with argument <code>.keep</code> (which is <code>TRUE</code> by default) to specify whether or not the grouped column should be kept. <pre class="prettyprint"><code>df %>% group_split(g, .keep = FALSE) #[[1]] # A tibble: 5 x 2 # x y # <dbl> <dbl> #1 -1.21 -1.45 #2 0.506 1.10 #3 -0.477 -1.17 #4 -0.110 1.45 #5 0.134 -0.969 #[[2]] # A tibble: 5 x 2 # x y # <dbl> <dbl> #1 0.277 0.575 #2 -0.575 -0.476 #3 -0.998 -2.18 #4 -0.511 -1.07 #5 -0.491 -1.11 #.... </code></pre> The difference between <code>base::split</code> and <code>dplyr::group_split</code> is that <code>group_split</code> does not name the elements of the list based on grouping. So <pre class="prettyprint"><code>df1 <- df %>% group_split(g) names(df1) #gives NULL </code></pre> whereas <pre class="prettyprint"><code>df2 <- split(df, df$g) names(df2) #gives #[1] "A" "B" "C" "D" "E" </code></pre> data <pre class="prettyprint"><code>set.seed(1234) df <- data.frame( x=rnorm(25), y=rnorm(25), g=rep(factor(LETTERS[1:5]), 5) ) </code></pre>

Split data.frame based on levels of a factor into new data.frames

Tags:

dataframe

r

r-faq

I'm trying to create separate data.frame objects based on levels of a factor. So if I have:

df <- data.frame(
  x=rnorm(25),
  y=rnorm(25),
  g=rep(factor(LETTERS[1:5]), 5)
)

How can I split df into separate data.frames for each level of g containing the corresponding x and y values? I can get most of the way there using split(df, df$g), but I'd like the each level of the factor to have its own data.frame.

What's the best way to do this?

752

asked Mar 15 '12 02:03

smillig

2 Answers

I think that split does exactly what you want.

Notice that X is a list of data frames, as seen by str:

X <- split(df, df$g)
str(X)

If you want individual object with the group g names you could assign the elements of X from split to objects of those names, though this seems like extra work when you can just index the data frames from the list split creates.

#I used lapply just to drop the third column g which is no longer needed.
Y <- lapply(seq_along(X), function(x) as.data.frame(X[[x]])[, 1:2]) 

#Assign the dataframes in the list Y to individual objects
A <- Y[[1]]
B <- Y[[2]]
C <- Y[[3]]
D <- Y[[4]]
E <- Y[[5]]

#Or use lapply with assign to assign each piece to an object all at once
lapply(seq_along(Y), function(x) {
    assign(c("A", "B", "C", "D", "E")[x], Y[[x]], envir=.GlobalEnv)
    }
)

Edit Or even better than using lapply to assign to the global environment use list2env:

names(Y) <- c("A", "B", "C", "D", "E")
list2env(Y, envir = .GlobalEnv)
A

164

answered Sep 29 '22 08:09

Tyler Rinker

Since dplyr 0.8.0 , we can also use group_split which has similar behavior as base::split

library(dplyr)
df %>% group_split(g)

#[[1]]
# A tibble: 5 x 3
#       x      y g    
#   <dbl>  <dbl> <fct>
#1 -1.21  -1.45  A    
#2  0.506  1.10  A    
#3 -0.477 -1.17  A    
#4 -0.110  1.45  A    
#5  0.134 -0.969 A    

#[[2]]
# A tibble: 5 x 3
#       x      y g    
#   <dbl>  <dbl> <fct>
#1  0.277  0.575 B    
#2 -0.575 -0.476 B    
#3 -0.998 -2.18  B    
#4 -0.511 -1.07  B    
#5 -0.491 -1.11  B  
#....

It also comes with argument .keep (which is TRUE by default) to specify whether or not the grouped column should be kept.

df %>% group_split(g, .keep = FALSE)

#[[1]]
# A tibble: 5 x 2
#       x      y
#   <dbl>  <dbl>
#1 -1.21  -1.45 
#2  0.506  1.10 
#3 -0.477 -1.17 
#4 -0.110  1.45 
#5  0.134 -0.969

#[[2]]
# A tibble: 5 x 2
#       x      y
#   <dbl>  <dbl>
#1  0.277  0.575
#2 -0.575 -0.476
#3 -0.998 -2.18 
#4 -0.511 -1.07 
#5 -0.491 -1.11 
#....

The difference between base::split and dplyr::group_split is that group_split does not name the elements of the list based on grouping. So

df1 <- df %>% group_split(g)
names(df1) #gives 
NULL

whereas

df2 <- split(df, df$g)
names(df2) #gives
#[1] "A" "B" "C" "D" "E"

data

set.seed(1234)
df <- data.frame(
      x=rnorm(25),
      y=rnorm(25),
      g=rep(factor(LETTERS[1:5]), 5)
)

answered Sep 29 '22 09:09

Ronak Shah

Related questions
                            
                                plot legends without border and with white background
                            
                                Nested ifelse statement
                            
                                Splitting a continuous variable into equal sized groups
                            
                                R - argument is of length zero in if statement
                            
                                Rtools not being detected by R
                            
                                How to visualize a large network in R?
                            
                                Cannot install R-forge package using install.packages
                            
                                Scale and size of plot in RStudio shiny
                            
                                How do Rpy2, pyrserve and PypeR compare?
                            
                                Create dynamic number of input elements with R/Shiny
                            
                                dplyr: nonstandard column names (white space, punctuation, starts with numbers)
                            
                                Emacs mode for R?
                            
                                How to extract the fill colours from a ggplot object?
                            
                                Error in grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : Polygon edge not found
                            
                                How does one stop using rowwise in dplyr?
                            
                                How to italicize part (one or two words) of an axis title
                            
                                reshape2 melt warning message
                            
                                Multi-row x-axis labels in ggplot line chart
                            
                                How to put labels over geom_bar in R with ggplot2
                            
                                Merge data frames based on rownames in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With