Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R subsetting without using subset() and use [ in a more concise manner to prevent typos?

Tags:

r

When working with data frames, it is common to need a subset. However use of the subset function is discouraged. The trouble with the following code is that the data frame name is repeated twice. If you copy&paste and munge code, it is easy to accidentally not change the second mention of adf which can be a disaster.

adf=data.frame(a=1:10,b=11:20)
print(adf[which(adf$a>5),])  ##alas, adf mentioned twice
print(with(adf,adf[{a>5},])) ##alas, adf mentioned twice
print(subset(adf,a>5)) ##alas, not supposed to use subset

Is there a way to write the above without mentioning adf twice? Unfortunately with with() or within(), I cannot seem to access adf as a whole?

The subset(...) function could make it easy, but they warn to not use it:

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

like image 396
Chris Avatar asked May 03 '15 15:05

Chris


People also ask

What does subsetting in R do?

Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.

What are the three subsetting operators in R?

There are three subsetting operators, [[ , [ , and $ . Subsetting operators interact differently with different vector types (e.g., atomic vectors, lists, factors, matrices, and data frames). Subsetting can be combined with assignment.

How do you subset a character in R?

R knows three basic way to subset. The first is the easiest: subsetting with a number n gives you the nth element. If you have a vector of numbers, you get a vector of elements. The second is also pretty easy: if you subset with a character vector, you get the element(s) with the corresponding name(s).


1 Answers

As @akrun states, I would use dplyr's filter function:

require("dplyr")
new <- filter(adf, a > 5)
new

In practice, I don't find the subsetting notation ([ ]) problematic because if I copy a block of code, I use find and replace within RStudio to replace all mentions of the dataframe in the selected code. Instead, I use dplyr because the notation and syntax is easier to follow for new users (and myself!), and because the dplyr functions 'do one thing well.'

like image 123
Phil Avatar answered Nov 03 '22 21:11

Phil