Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I create component (subset) dataframes in R based on column values?

Tags:

split

r

subset

I'd like to split a dataframe into several component dataframes based on the values in one column. In my example, I want to split dat into dat.1, dat.2 and dat.3 using the values in column "cond". Is there a simple command which could achieve this?

dat
sub cond    trial   time01  time02
1   1   1   2774    8845
1   1   2   2697    9945
1   2   1   2219    9291
1   2   2   3886    7890
1   3   1   4011    9032
2   2   1   3478    8827
2   2   2   2263    8321
2   3   1   4312    7576
3   1   1   4219    7891
3   3   1   3992    6674


dat.1               
sub cond    trial   time01  time02
1   1   1   2774    8845
1   1   2   2697    9945
3   1   1   4219    7891    

dat.2               
sub cond    trial   time01  time02
2   2   1   3478    8827
2   2   2   2263    8321
1   2   1   2219    9291
1   2   2   3886    7890

dat.3               
sub cond    trial   time01  time02
1   3   1   4011    9032
2   3   1   4312    7576
3   3   1   3992    6674

Perhaps because I'm an R novice I've still not determined how to do this despite browsing and trying the solutions proposed in several similar forum queries. Thank you in advance for any replies.

A dput() of the data is:

structure(list(sub = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L
), cond = c(1L, 1L, 2L, 2L, 3L, 2L, 2L, 3L, 1L, 3L), trial = c(1L, 
2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), time01 = c(2774L, 2697L, 
2219L, 3886L, 4011L, 3478L, 2263L, 4312L, 4219L, 3992L), time02 = c(8845L, 
9945L, 9291L, 7890L, 9032L, 8827L, 8321L, 7576L, 7891L, 6674L
)), .Names = c("sub", "cond", "trial", "time01", "time02"), class = "data.frame", row.names = c(NA, 
-10L))
like image 442
dancingRobot Avatar asked Jun 08 '11 11:06

dancingRobot


People also ask

How do you subset a DataFrame based on columns in R?

The most general way to subset a data frame by rows and/or columns is the base R Extract[] function, indicated by matched square brackets instead of the usual matched parentheses. For a data frame named d the general format is d[rows, columms] .

How do I subset data based on column values in R?

By using R base df[] notation, or subset() you can easily subset the R Data Frame (data. frame) by column value or by column name.

How do I subset data from a DataFrame in R?

If you wanted to get the subset of a data. frame (DataFrame) Rows & Columns in R, either use the subset() function , filter() from dplyr package or R base square bracket notation df[] . subset() is a generic R function that is used to get the rows and columns (In R terms observations & variables) from the data frame.


2 Answers

I think the easiest way is via split:

split(dat, dat$cond)

Note however, that split returns a list of the data.frames.

To obtain single data.frames from the list you could procede as follows using a loop to make the single objects (implicit in the lapply statement):

tmp <- split(dat, dat$cond)
lapply(1:length(tmp), function(x) assign(paste("dat.", x, sep = ""), tmp[[x]], envir = .GlobalEnv))

However, using a list is probably more Rish and will be more useful in the long run.

Thanks to Gavin for posting the data!

like image 194
Henrik Avatar answered Nov 02 '22 05:11

Henrik


Is there anything not satisfying about

split(dat, dat$cond)

? You do have R and split as tags, you know...

like image 44
Nick Sabbe Avatar answered Nov 02 '22 06:11

Nick Sabbe