Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split dataframe by levels of a factor and name dataframes by those levels

Tags:

r

I want to split an existing dataframe by the levels of one of the factor variables so that the names of the split dataframes would correspond to the levels of the factor.

df <- data.frame(cbind(X = 1:10, Y = rnorm(10)), Z = sample(LETTERS[1:3], 10, replace = TRUE))

If df is the original dataframe, I want to split it into three dataframes called A, B and C, such that:

A = subset(df, Z == 'A')
B = subset(df, Z == 'B')
...

Is there an easy way to do this in one shot? I have a huge dataset and the factor variable has too many levels.

like image 520
user702432 Avatar asked Dec 06 '22 06:12

user702432


2 Answers

In base R, you should use the function split. And split has a default method and one for data.frame. However, I find that split.data.frame is very slow as the number of levels to split on becomes huge. That is,

# inefficient in my opinion
split(df, df$Z)

The above solution will give you the names you ask for as well directly, but will choke on large levels.

And if you're willing to trade using external packages for speed/efficiency, I'd suggest using data.table package:

require(data.table)
dt <- data.table(df)
oo <- dt[, list(list(.SD)), by = Z]$V1
names(oo) <- unique(dt$Z)
like image 148
Arun Avatar answered Dec 22 '22 01:12

Arun


You can do it with the plyr package

require(plyr)
dlply(df, .(Z))
like image 30
Ramnath Avatar answered Dec 22 '22 01:12

Ramnath