I have a data frame (sampdata
) that looks something like this:
A B C D
1 X 5 0.3
2 Y 10 0.9
3 Y 7 0.2
4 Y 5 0.4
5 X 10 0.7
Basically, I want to create two new data frames based on both column B and C. On earlier posts I have seen how to subset the data using 'split' based on one factor which I did do
test <- split(sampdata, sampdata$B)
str(test)
So far so good. But, when I tried to add in a second split:
testBC <- split(test, test$C)
I received an error message:
Error in split.default(test, test$Product) : group length is 0 but data length > 0
I also tried:
testBC <- split(test$B, test$C)
but got another error message. So, then I tried a second method, based on ddply
and plyr
package:
test2 <- ddply(sampdata, c("B", "C"))
This did organize the data by row such that:
A B C D
1 X 5 0.3
5 X 10 0.7
2 Y 10 0.9
3 Y 7 0.2
4 Y 5 0.4
However, other threads only show how to access a specific data frame based on one col (test2$B
) but not both. I would prefer to simply generate a new data frame based on a subset of B and C such that:
newdf1
A B C D
1 X 5 .3
5 X 10 .9
newdf2
A B C D
2 Y 7 .2
3 Y 5 .4
4 Y 10 .7
After trying a couple methods what is likely a straightforward/simple task is surprisingly difficult (for me at least).
Any help most appreciated.
If we need to split by multiple columns place it in a list
split(df1, list(df1$B, df1$C), drop = TRUE)
#$X.5
# A B C D
#1 1 X 5 0.3
#$Y.5
# A B C D
#4 4 Y 5 0.4
#$Y.7
# A B C D
#3 3 Y 7 0.2
#$X.10
# A B C D
#5 5 X 10 0.7
#$Y.10
# A B C D
#2 2 Y 10 0.9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With