I've written the following function based on <code>subset()</code>, which I find handy: <pre class="prettyprint"><code>ss <- function (x, subset, ...) { r <- eval(substitute(subset), data.frame(.=x), parent.frame()) if (!is.logical(r)) stop("'subset' must be logical") x[r & !is.na(r)] } </code></pre> So, I can write: <pre class="prettyprint"><code>ss(myDataFrame$MyVariableName, 500 < . & . < 1500) </code></pre> instead of <pre class="prettyprint"><code>myDataFrame$MyVariableName[ 500 < myDataFrame$MyVariableName & myDataFrame$MyVariableName < 1500] </code></pre> This seems like something other people might have developed solutions for, though - including something in core R I might have missed. Anything already out there?

Thanks for sharing Ken. You could use: <pre class="prettyprint"><code>x <- myDataFrame$MyVariableName; x[x > 100 & x < 180] </code></pre> Yours may require less typing but the code is less generalizable to others if you're sharing code. I have a few time saver functions like that myself but use them sparingly because they may be slowing down your code (extra steps) and requires you to also include that code for that function when ever you share the file with someone else. Compare writing length. Almost the same length: <pre class="prettyprint"><code>ss(mtcars$hp, 100 < . & . < 180) x <- mtcars$hp; x[x > 100 & x < 180] </code></pre> Compare time on 1000 replications. <pre class="prettyprint"><code>library(rbenchmark) benchmark( tyler = x[x > 100 & x < 180], ken = ss(mtcars$hp, 100 <. & . < 180), replications=1000) test replications elapsed relative user.self sys.self user.child sys.child 2 ken 1000 0.56 18.66667 0.36 0.03 NA NA 1 tyler 1000 0.03 1.00000 0.03 0.00 NA NA </code></pre> So I guess it depends on if you need speed and/or sharability vs convenience. If it's just for you on a small data set I'd say it's valuable. EDIT: NEW BENCHMARKING <pre class="prettyprint"><code>> benchmark( + tyler = {x <- mtcars$hp; x[x > 100 & x < 180]}, + ken = ss(mtcars$hp, 100 <. & . < 180), + ken2 = ss2(mtcars$hp, 100 <. & . < 180), + joran = with(mtcars,hp[hp>100 & hp< 180 ]), + replications=10000) test replications elapsed relative user.self sys.self user.child sys.child 4 joran 10000 0.83 2.677419 0.69 0.00 NA NA 2 ken 10000 3.79 12.225806 3.45 0.02 NA NA 3 ken2 10000 0.67 2.161290 0.35 0.00 NA NA 1 tyler 10000 0.31 1.000000 0.20 0.00 NA NA </code></pre>

subset() of a vector in R

Tags:

r

subset

I've written the following function based on subset(), which I find handy:

ss <- function (x, subset, ...) 
{
    r <- eval(substitute(subset), data.frame(.=x), parent.frame())
    if (!is.logical(r)) 
        stop("'subset' must be logical")
    x[r & !is.na(r)]
}

So, I can write:

ss(myDataFrame$MyVariableName, 500 < . & . < 1500)

instead of

myDataFrame$MyVariableName[ 500 < myDataFrame$MyVariableName 
                                & myDataFrame$MyVariableName < 1500]

This seems like something other people might have developed solutions for, though - including something in core R I might have missed. Anything already out there?

992

asked Jan 19 '12 21:01

Ken Williams

2 Answers

I realize that the solution Ken offers is more general than just selecting items within ranges (since it should work on any logical expression) but this did remind me that Greg Snow has comparison infix operators in his Teaching Demos package:

library(TeachingDemos)
x0 <- rnorm(100)
x0[ 0 %<% x0 %<% 1.5 ]

179

answered Oct 23 '22 05:10

IRTFM

Thanks for sharing Ken.

You could use:

x <- myDataFrame$MyVariableName; x[x > 100 & x < 180]

Yours may require less typing but the code is less generalizable to others if you're sharing code. I have a few time saver functions like that myself but use them sparingly because they may be slowing down your code (extra steps) and requires you to also include that code for that function when ever you share the file with someone else.

Compare writing length. Almost the same length:

ss(mtcars$hp, 100 < . & . < 180)
x <- mtcars$hp; x[x > 100 & x < 180]

Compare time on 1000 replications.

library(rbenchmark)
benchmark(
       tyler = x[x > 100 & x < 180],
       ken = ss(mtcars$hp, 100 <. & . < 180),
 replications=1000)

   test replications elapsed relative user.self sys.self user.child sys.child
2   ken         1000    0.56 18.66667      0.36     0.03         NA        NA
1 tyler         1000    0.03  1.00000      0.03     0.00         NA        NA

So I guess it depends on if you need speed and/or sharability vs convenience. If it's just for you on a small data set I'd say it's valuable.

EDIT: NEW BENCHMARKING

> benchmark(
+     tyler = {x <- mtcars$hp; x[x > 100 & x < 180]}, 
+     ken = ss(mtcars$hp, 100 <. & . < 180), 
+     ken2 = ss2(mtcars$hp, 100 <. & . < 180),
+     joran = with(mtcars,hp[hp>100 & hp< 180 ]), 
+  replications=10000)

   test replications elapsed  relative user.self sys.self user.child sys.child
4 joran        10000    0.83  2.677419      0.69     0.00         NA        NA
2   ken        10000    3.79 12.225806      3.45     0.02         NA        NA
3  ken2        10000    0.67  2.161290      0.35     0.00         NA        NA
1 tyler        10000    0.31  1.000000      0.20     0.00         NA        NA

answered Oct 23 '22 05:10

Tyler Rinker

Related questions
                            
                                What debugging tools does R lack that other languages have? [closed]
                            
                                R - legend: assign multiple colours to the same text
                            
                                plotting a graph with date on the x-axis in R
                            
                                Improving a function to get stock news data from google in R
                            
                                Is there a dynamic word/tag cloud Java API somewhere? [closed]
                            
                                How can I make this R matrix filling function faster?
                            
                                how to get sweave to center figures without centering code
                            
                                Randomly selecting values from an existing matrix after adding a vector (in R)
                            
                                R apply error: 'X' must have named dimnames
                            
                                R.h and Rmath.h in native C program
                            
                                plotting the top 5 values from a table in R
                            
                                Keeping metadata when using gIntersection from rgeos package in R
                            
                                Get time from xts index
                            
                                Using inst/extdata with vignette during package checking R 2.14.0
                            
                                Differences between vectors _including_ NA
                            
                                Combine multiple categorical variables in one dummy variable
                            
                                R as.POSIXct(Sys.Date()) returns date a day early
                            
                                Looking for a more efficient ifelse()
                            
                                How to get the list of class that have a common S4 superclass in R
                            
                                Getting Sweave code chunks to stay inside page margins?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With