Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is something wrong with my system.time wrapper function?

Tags:

r

EDIT: Updated thanks to @daroczig's lovely answer below. However, test 2 still feels like it takes longer than test 1 which is what I'm wondering about.

UPDATE: On second reading, @daroczig's answer does explain away my confusion -- the issue was due to me not properly thinking out the system.time(expr) line of code.

I wanted to make a version of the system.time function which would be a bit more informative for myself in terms of understanding run-to-run time fluctuations:

system.time.summary <- function(N, expr) {
  t.mat <- replicate(N, system.time(expr))
  as.data.frame(apply(t.mat[1:3,], 1, summary))
}

However the problem is, in the self contained code below, test.2 feels like it is taking longer to do than test.1 (and I've run them several times to check), even though the code is pretty much exactly the same (test.1 uses a wrapper function, whereas test.2 is just the raw code)

# set up number of runs
N <- 100

# test 1
system.time.summary(N, (1:1e8)^2 + 1)

        user.self sys.self elapsed
Min.        0.000    0.000   0.000
1st Qu.     0.000    0.000   0.000
Median      0.000    0.000   0.000
Mean        0.058    0.031   0.089
3rd Qu.     0.000    0.000   0.000
Max.        0.580    0.310   0.890

# test 2
t.mat = replicate(N, system.time((1:1e8)^2 + 1))
as.data.frame(apply(t.mat[1:3,], 1, summary))

        user.self sys.self elapsed
Min.        0.630    0.120   0.860
1st Qu.     0.665    0.170   0.880
Median      0.695    0.195   0.880
Mean        0.692    0.196   0.882
3rd Qu.     0.715    0.225   0.890
Max.        0.760    0.260   0.900

I hope I explained that OK! It could be that it's currently Monday morning, but this is confusing me...

My system:

# Windows Server 2008 R2
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-pc-mingw32/x64 (64-bit)
like image 493
Tony Breyal Avatar asked Jan 20 '23 09:01

Tony Breyal


2 Answers

You are running system.time(system.time()) in the fist test, and also having system.time(1:1e8)^2 + 1) as an expression in the function which not a good idea, see:

> expr <- system.time((1:1e8)^2 + 1)
> system.time(expr)
   user  system elapsed 
      0       0       0 

But anyway: use the microbenchmark package from CRAN for such purposes, you will not regret. Set up your distinct functions and you can easily roll out your simulations with 100, 1000 or any runs. You can get a neat summary and boxplots at the end of benchmarking.

For example:

> test1 <- function() (1:1e8)^2 + 1
> (results <- microbenchmark(test1(), times=10))
Unit: nanoeconds
               min         lq     median         uq        max
test1() 3565386356 3703142531 3856450582 3931033077 3986309085
> boxplot(results)

enter image description here

like image 148
daroczig Avatar answered Jan 28 '23 18:01

daroczig


As daroczig said, you have an extra system.time. But there's something else :

If you put a browser() in your function, you'd see what happens. In fact, the expression you make, is evaluated only once and then kept in memory. This is how R optimizes internally. So if you do :

system.time.summary(N,(1:1e8)^2 +1)

t.mat is internally :

           [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
user.self  0.61    0    0    0    0    0    0    0    0     0
sys.self   0.36    0    0    0    0    0    0    0    0     0
elapsed    0.97    0    0    0    0    0    0    0    0     0
user.child   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
sys.child    NA   NA   NA   NA   NA   NA   NA   NA   NA    NA

and expr is :

Browse[2]> str(expr)
 num [1:100000000] 2 5 10 17 26 37 50 65 82 101 ...

It's a bit difficult to change this, as R will evaluate any static expression only once and then retrieve the result another 99 times from the memory. If you want this not to happen, you have to pass an expression explicitly, and add the eval() function. :

system.time.summary <- function(N, expr) {
  t.mat <- replicate(N, system.time(eval(expr)))
  as.data.frame(apply(t.mat[1:3,], 1, summary))
}

system.time.summary(N, expression((1:1e8)^2 + 1))

Now expr gets evaluated every time, and remains an expression in the function :

Browse[2]> expr
expression((1:1e+08)^2 + 1)

This gives you the correct timings.

        user.self sys.self elapsed
Min.       0.6400   0.2000   0.970
1st Qu.    0.6850   0.2375   0.980
Median     0.7150   0.2700   0.985
Mean       0.7130   0.2700   0.985
3rd Qu.    0.7425   0.2975   0.990
Max.       0.7800   0.3500   1.000
like image 39
Joris Meys Avatar answered Jan 28 '23 20:01

Joris Meys