Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R subsetting a data frame into multiple data frames based on multiple column values

Tags:

I am trying to subset a data frame, where I get multiple data frames based on multiple column values. Here is my example

>df   v1   v2   v3   v4   v5    A    Z    1    10   12    D    Y    10   12    8    E    X    2    12   15    A    Z    1    10   12    E    X    2    14   16 

The expected output is something like this where I am splitting this data frame into multiple data frames based on column v1 and v2

>df1  v3   v4   v5   1   10   12   1   10   12 >df2  v3   v4   v5  10   12    8 >df3  v3   v4   v5  2    12   15  2    14   16 

I have written a code which is working right now but don't think that's the best way to do it. There must be a better way to do it. Assuming tab is the data.frame having the initial data. Here is my code:

v1Factors<-levels(factor(tab$v1)) v2Factors<-levels(factor(tab$v2))  for(i in 1:length(v1Factors)){   for(j in 1:length(v2Factors)){     subsetTab<-subset(tab, v1==v1Factors[i] & v2==v2Factors[j], select=c("v3", "v4", "v5"))     print(subsetTab)   } } 

Can someone suggest a better method to do the above?

like image 899
Rachit Agrawal Avatar asked Mar 13 '13 04:03

Rachit Agrawal


People also ask

How do I Rbind data frames with different columns in R?

Method 1 : Using plyr package rbind. fill() method in R is an enhancement of the rbind() method in base R, is used to combine data frames with different columns. The column names are number may be different in the input data frames. Missing columns of the corresponding data frames are filled with NA.

How do you split data frames in R?

Use the split() function in R to split a vector or data frame. Use the unsplit() method to retrieve the split vector or data frame.

How do I combine multiple data frames in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

How do you split a data frame in two variables?

You can also do the following: split(x = df, f = ~ var1 + var2...) This way, you can also achieve the same split dataframe by many variables without using a list in the f parameter.


1 Answers

You are looking for split

split(df, with(df, interaction(v1,v2)), drop = TRUE) $E.X   v1 v2 v3 v4 v5 3  E  X  2 12 15 5  E  X  2 14 16  $D.Y   v1 v2 v3 v4 v5 2  D  Y 10 12  8  $A.Z   v1 v2 v3 v4 v5 1  A  Z  1 10 12 

As noted in the comments

any of the following would work

library(microbenchmark) microbenchmark(                 split(df, list(df$v1,df$v2), drop = TRUE),                 split(df, interaction(df$v1,df$v2), drop = TRUE),                split(df, with(df, interaction(v1,v2)), drop = TRUE))   Unit: microseconds                                                   expr      min        lq    median       uq      max neval             split(df, list(df$v1, df$v2), drop = TRUE) 1119.845 1129.3750 1145.8815 1182.119 3910.249   100      split(df, interaction(df$v1, df$v2), drop = TRUE)  893.749  900.5720  909.8035  936.414 3617.038   100  split(df, with(df, interaction(v1, v2)), drop = TRUE)  895.150  902.5705  909.8505  927.128 1399.284   100 

It appears interaction is slightly faster (probably due the fact that the f = list(...) are just converted to an interaction within the function)


Edit

If you just want use the subset data.frames then I would suggest using data.table for ease of coding

library(data.table)  dt <- data.table(df) dt[, plot(v4, v5), by = list(v1, v2)] 
like image 175
mnel Avatar answered Sep 29 '22 23:09

mnel