Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Evaluate multiline codeblock with microbenchmark

Is it possible to evaluate a codeblock consisting of multiple lines of code with microbenchmark? If so, how?

Example: We have some numeric data in character columns:

testdata <- tibble::tibble(col1 = runif(1000), col2 = as.character(runif(1000)), col3 = as.character(runif(1000)))

Now we can try different ways of converting these. We can directly call as.numeric on the columns:

testdata$col2 <- as.numeric(testdata$col2)
testdata$col3 <- as.numeric(testdata$col3)

We could try doing it inside a dplyr mutate:

testdata <- dplyr::mutate(testdata, col2 = as.numeric(col2),
               col3 = as.numeric(col3))

Or maybe we know all columns should be numeric so we can try something less explicit that does some checking:

testdata <- dplyr::mutate_if(testdata, .predicate = is.character, .funs = as.numeric)

Now we want to compare the performance of these 3 options.

The latter 2 options are individual calls so these can easily be tested in microbenchmark, but the first option consists of two separate calls. We could wrap the two calls in a function and then evaluate that in microbenchmark, but this introduces the slight overhead of the function, so isn't technically evaluating the solution that we have now. We can include the calls separately in the microbenchmark and then add them up after, for the mean should do fine, but for things like the min or the max this doesn't necessarily give sensible results.

The examples in the docs for microbenchmark mostly use simple individual expressions and often use a simple function to wrap code.

Is it possible to directly input multiple lines of code into microbenchmark to be evaluated together?

like image 930
Marijn Stevering Avatar asked Dec 27 '17 13:12

Marijn Stevering


1 Answers

By wrapping multiple lines of code in {} and separating them with a ; they can be evaluated as one block in microbenchmark

bench <- microbenchmark(separate = {as.numeric(testdata$col2); as.numeric(testdata$col3)},
                    mutate = dplyr::mutate(testdata, col2 = as.numeric(col2),
                                           col3 = as.numeric(col3)),
                    mutateif = dplyr::mutate_if(testdata, .predicate = is.character, .funs = as.numeric))

Which gives the following results:

> bench
Unit: microseconds
     expr      min       lq      mean    median        uq        max neval
 separate  477.014  529.708  594.8982  576.4275  611.6275   1109.762   100
   mutate 3410.351 3633.070 4465.0583 3876.6975 4446.0845  34298.910   100
 mutateif 5118.725 5365.126 7241.5727 5637.5520 6290.7795 118874.982   100
like image 50
Marijn Stevering Avatar answered Nov 09 '22 21:11

Marijn Stevering