Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split data frame by two factors

Tags:

split

r

subset

I have a data frame (sampdata) that looks something like this:

A B  C   D
1 X  5 0.3
2 Y 10 0.9
3 Y  7 0.2
4 Y  5 0.4
5 X 10 0.7

Basically, I want to create two new data frames based on both column B and C. On earlier posts I have seen how to subset the data using 'split' based on one factor which I did do

test <- split(sampdata, sampdata$B)
str(test)

So far so good. But, when I tried to add in a second split:

testBC <- split(test, test$C)

I received an error message:

Error in split.default(test, test$Product) : group length is 0 but data length > 0

I also tried:

testBC <- split(test$B, test$C)

but got another error message. So, then I tried a second method, based on ddply and plyr package:

test2 <- ddply(sampdata, c("B", "C"))

This did organize the data by row such that:

A B  C   D
1 X  5 0.3
5 X 10 0.7 
2 Y 10 0.9
3 Y  7 0.2
4 Y  5 0.4

However, other threads only show how to access a specific data frame based on one col (test2$B) but not both. I would prefer to simply generate a new data frame based on a subset of B and C such that:

newdf1
A B C   D
1 X 5  .3
5 X 10 .9

newdf2
A B C   D
2 Y 7  .2
3 Y 5  .4
4 Y 10  .7

After trying a couple methods what is likely a straightforward/simple task is surprisingly difficult (for me at least).

Any help most appreciated.

like image 696
Ethan D. Avatar asked Oct 07 '17 04:10

Ethan D.


1 Answers

If we need to split by multiple columns place it in a list

split(df1, list(df1$B, df1$C), drop = TRUE)
#$X.5
#  A B C   D
#1 1 X 5 0.3

#$Y.5
#  A B C   D
#4 4 Y 5 0.4

#$Y.7
#  A B C   D
#3 3 Y 7 0.2

#$X.10
#  A B  C   D
#5 5 X 10 0.7

#$Y.10
#  A B  C   D
#2 2 Y 10 0.9
like image 178
akrun Avatar answered Nov 15 '22 01:11

akrun