When working with data frames, it is common to need a subset. However use of the subset function is discouraged. The trouble with the following code is that the data frame name is repeated twice. If you copy&paste and munge code, it is easy to accidentally not change the second mention of adf which can be a disaster.
adf=data.frame(a=1:10,b=11:20)
print(adf[which(adf$a>5),]) ##alas, adf mentioned twice
print(with(adf,adf[{a>5},])) ##alas, adf mentioned twice
print(subset(adf,a>5)) ##alas, not supposed to use subset
Is there a way to write the above without mentioning adf twice? Unfortunately with with() or within(), I cannot seem to access adf as a whole?
The subset(...) function could make it easy, but they warn to not use it:
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.
Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.
There are three subsetting operators, [[ , [ , and $ . Subsetting operators interact differently with different vector types (e.g., atomic vectors, lists, factors, matrices, and data frames). Subsetting can be combined with assignment.
R knows three basic way to subset. The first is the easiest: subsetting with a number n gives you the nth element. If you have a vector of numbers, you get a vector of elements. The second is also pretty easy: if you subset with a character vector, you get the element(s) with the corresponding name(s).
As @akrun states, I would use dplyr
's filter
function:
require("dplyr")
new <- filter(adf, a > 5)
new
In practice, I don't find the subsetting notation ([ ]
) problematic because if I copy a block of code, I use find and replace within RStudio to replace all mentions of the dataframe in the selected code. Instead, I use dplyr because the notation and syntax is easier to follow for new users (and myself!), and because the dplyr functions 'do one thing well.'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With