I want to conditionally subset a dataframe without referencing the dataframe. For example if I have the following:
long_data_frame_name <- data.frame(x=1:10, y=1:10)
I want to say:
subset <- long_data_frame_name[x < 5,]
But instead, I have to say:
subset <- long_data_frame_name[long_data_frame_name$x < 5,]
plyr and ggplot handle this so beautifully. Is there any package that makes subsetting a data frame similarly beautiful?
It sounds like you are looking for the data.table package, which implements indexing syntax just like that which you describe. (data.table
objects are essentially data.frame
s with added functionality, so you can continue to use them almost anywhere you would use a "plain old" data.frame.)
Matthew Dowle, the package's author, argues for the advantages of [.data.table()
's indexing syntax in his answer to this popular SO [r]-tag question. His answer there could just as well have been written as a direct response to your question above!
Here's an example:
library(data.table)
long_data_table_name <- data.table(x=1:10, y=1:10)
subset <- long_data_table_name[x < 5, ]
subset
# x y
# 1: 1 1
# 2: 2 2
# 3: 3 3
# 4: 4 4
Yes:
newdata <- subset(mydata, sex=="m" & age > 25)
or
newdata <- subset(mydata, sex=="m" & age > 25 , select=weight:income)
Reference: http://www.statmethods.net/management/subset.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With