Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I profile code when using knitr?

I am trying to institute a workflow to answer SO questions using knitr and render_markdown(strict = T ) to format my answers and / questions.

Recently I was trying to run profiling on some code using profr and due to the implementation of knitr, the profiling picks up all the calls to evaluate etc that underpin knitr

For example

Not in knitr

If I run in plain R

library(profr)
quantile_ex <- profr({Sys.sleep(1); example(quantile, setRNG = TRUE)}, 0.01)
quantile_ex
##                      f level time start  end  leaf source
## 9       Sys.sleep     1 0.64  0.01 0.65  TRUE   base
## 10        example     1 0.05  0.65 0.70 FALSE  utils
## 11   index.search     2 0.01  0.65 0.66 FALSE   <NA>
## 12    <Anonymous>     2 0.02  0.66 0.68 FALSE   <NA>
## 13         source     2 0.01  0.68 0.69 FALSE   base
## 14         unlink     2 0.01  0.69 0.70  TRUE   base
## 15    file.exists     3 0.01  0.65 0.66  TRUE   base
## 16     prepare_Rd     3 0.01  0.66 0.67 FALSE   <NA>
## 17            of0     3 0.01  0.67 0.68 FALSE   <NA>
## 18           file     3 0.01  0.68 0.69  TRUE   base
## 19   .getHelpFile     4 0.01  0.66 0.67 FALSE   <NA>
## 20            of1     4 0.01  0.67 0.68 FALSE   <NA>
## 21    <Anonymous>     5 0.01  0.66 0.67 FALSE   <NA>
## 22     WriteLines     5 0.01  0.67 0.68 FALSE   <NA>
## 23 lazyLoadDBexec     6 0.01  0.66 0.67 FALSE   base
## 24     writeLines     6 0.01  0.67 0.68 FALSE   base
## 25            fun     7 0.01  0.66 0.67 FALSE   <NA>
## 26         paste0     7 0.01  0.67 0.68 FALSE   base
## 27       basename     8 0.01  0.66 0.67  TRUE   base
## 28             wr     8 0.01  0.67 0.68 FALSE   <NA>
## 29         paste0     9 0.01  0.67 0.68 FALSE   base
## 30        strwrap    10 0.01  0.67 0.68 FALSE   base
## 31         lapply    11 0.01  0.67 0.68 FALSE   base
## 32       strsplit    12 0.01  0.67 0.68  TRUE   base

Using knitr + profr

render_markdown(strict = T)
library(profr)
quantile_ex <- profr({
    Sys.sleep(1)
    example(quantile, setRNG = TRUE)
}, 0.01)
quantile_ex

##                      f level time start  end  leaf source
## 8          <Anonymous>     1 0.80  0.00 0.80 FALSE   <NA>
## 9  process_group.block     2 0.80  0.00 0.80 FALSE   <NA>
## 10          call_block     3 0.80  0.00 0.80 FALSE   <NA>
## 11          block_exec     4 0.80  0.00 0.80 FALSE   <NA>
## 12            evaluate     5 0.80  0.00 0.80 FALSE   <NA>
## 13              unlist     6 0.80  0.00 0.80 FALSE   base
## 14              mapply     7 0.80  0.00 0.80 FALSE   base
## 15         <Anonymous>     8 0.80  0.00 0.80 FALSE   <NA>
## 16                 try     9 0.80  0.00 0.80 FALSE   base
## 17            tryCatch    10 0.80  0.00 0.80 FALSE   base
## 18        tryCatchList    11 0.80  0.00 0.80 FALSE   <NA>
## 19         tryCatchOne    12 0.80  0.00 0.80 FALSE   <NA>
## 20          doTryCatch    13 0.80  0.00 0.80 FALSE   <NA>
## 21 withCallingHandlers    14 0.80  0.00 0.80 FALSE   base
## 22         withVisible    15 0.80  0.00 0.80 FALSE   base
## 23                eval    16 0.80  0.00 0.80 FALSE   base
## 24                eval    17 0.80  0.00 0.80 FALSE   base
## 25               profr    18 0.80  0.00 0.80 FALSE  profr
## 26                 try    19 0.80  0.00 0.80 FALSE   base
## 27            tryCatch    20 0.80  0.00 0.80 FALSE   base
## 28        tryCatchList    21 0.80  0.00 0.80 FALSE   <NA>
## 29         tryCatchOne    22 0.80  0.00 0.80 FALSE   <NA>
## 30          doTryCatch    23 0.80  0.00 0.80 FALSE   <NA>
## 31               force    24 0.80  0.00 0.80 FALSE   base
## 32           Sys.sleep    25 0.77  0.00 0.77  TRUE   base
## 33             example    25 0.03  0.77 0.80 FALSE  utils
## 34         <Anonymous>    26 0.02  0.77 0.79 FALSE   <NA>
## 35              source    26 0.01  0.79 0.80 FALSE   base
## 36          prepare_Rd    27 0.01  0.77 0.78 FALSE   <NA>
## 37              render    27 0.01  0.78 0.79 FALSE   <NA>
## 38         withVisible    27 0.01  0.79 0.80 FALSE   base
## 39        .getHelpFile    28 0.01  0.77 0.78 FALSE   <NA>
## 40                 of0    28 0.01  0.78 0.79 FALSE   <NA>
## 41                eval    28 0.01  0.79 0.80 FALSE   base
## 42         <Anonymous>    29 0.01  0.77 0.78 FALSE   <NA>
## 43                 of1    29 0.01  0.78 0.79 FALSE   <NA>
## 44                eval    29 0.01  0.79 0.80 FALSE   base
## 45      lazyLoadDBexec    30 0.01  0.77 0.78 FALSE   base
## 46          WriteLines    30 0.01  0.78 0.79 FALSE   <NA>
## 47            quantile    30 0.01  0.79 0.80 FALSE  stats
## 48                 fun    31 0.01  0.77 0.78 FALSE   <NA>
## 49          writeLines    31 0.01  0.78 0.79  TRUE   base
## 50    quantile.default    31 0.01  0.79 0.80 FALSE  stats
## 51               fetch    32 0.01  0.77 0.78 FALSE   <NA>
## 52               paste    32 0.01  0.79 0.80  TRUE   base
## 53         <Anonymous>    33 0.01  0.77 0.78 FALSE   <NA>
## 54       existsInFrame    34 0.01  0.77 0.78  TRUE   <NA>

using knitr + Rprof + profr::parse_rprof

tmp <- tempfile()
Rprof(tmp, interval = 0.01)
try(force(quantile_ex <- profr({
    Sys.sleep(1)
    example(quantile, setRNG = TRUE)
}, 0.01)))
Rprof(NULL)
parsed <- parse_rprof(tmp, interval = 0.01)

## Error: invalid 'envir' argument

parse_rprof won't run due to environment issues

knitr + Rprof + summaryRprof

Using the utils::summaryRprof will read and summarize

summaryRprof(tmp)

## $by.self
##       self.time self.pct total.time total.pct
## Rprof      0.01      100       0.01       100
## 
## $by.total
##                     total.time total.pct self.time self.pct
## Rprof                     0.01       100      0.01      100
## <Anonymous>               0.01       100      0.00        0
## block_exec                0.01       100      0.00        0
## call_block                0.01       100      0.00        0
## doTryCatch                0.01       100      0.00        0
## eval                      0.01       100      0.00        0
## evaluate                  0.01       100      0.00        0
## knit                      0.01       100      0.00        0
## mapply                    0.01       100      0.00        0
## process_file              0.01       100      0.00        0
## process_group.block       0.01       100      0.00        0
## try                       0.01       100      0.00        0
## tryCatch                  0.01       100      0.00        0
## tryCatchList              0.01       100      0.00        0
## tryCatchOne               0.01       100      0.00        0
## unlist                    0.01       100      0.00        0
## withCallingHandlers       0.01       100      0.00        0
## withVisible               0.01       100      0.00        0
## 
## $sample.interval
## [1] 0.01
## 
## $sampling.time
## [1] 0.01
## 

It doesn't seem to actually profile the code that was run, only the knitr / evaluate details.

Question and possible solution

My current work flow is to just run profiling outside knitr, but the automatic formatting (commenting out output with ##) I find really useful.

Has anyone come across a neat solution to this, or is it something to take up with the package developers?

My current idea is to rewrite parse_rprof to limit itself to calls above the first call to force (not the default level 7)

Example rewrite of profr

Here is a rewrite of profr that almost works

library(profr)
profr <- function(expr, interval = 0.02, quiet = TRUE) {
    tmp <- tempfile()
    on.exit(unlink(tmp))
    on.exit(unlink("Rprof.out"), add = T)
    if (quiet) {
        tc <- textConnection(NULL, "w")
        sink(tc)
        on.exit(sink(), add = TRUE)
        on.exit(close(tc), add = TRUE)
    }
    Rprof(tmp, append = TRUE, interval = interval)
    try(force(expr))
    Rprof(NULL)
    parsed <- parse_rprof(tmp, interval)
    which_force <- min(which(parsed$f == "force"))

    parsed <- parsed[parsed$level > parsed$level[which_force], ]
    parsed$level <- parsed$level - parsed$level[which_force]
    parsed
}

quantile_ex <- profr({
    Sys.sleep(1)
    example(quantile, setRNG = TRUE)
}, 0.01)
quantile_ex


##                 f level time start  end  leaf source
## 32      Sys.sleep    NA 0.90  0.00 0.90  TRUE   base
## 33        example    NA 0.03  0.90 0.93 FALSE  utils
## 34    <Anonymous>    NA 0.03  0.90 0.93 FALSE   <NA>
## 35     prepare_Rd    NA 0.01  0.90 0.91 FALSE   <NA>
## 36         render    NA 0.02  0.91 0.93 FALSE   <NA>
## 37   .getHelpFile    NA 0.01  0.90 0.91 FALSE   <NA>
## 38           %in%    NA 0.01  0.91 0.92  TRUE   base
## 39            of0    NA 0.01  0.92 0.93 FALSE   <NA>
## 40    <Anonymous>    NA 0.01  0.90 0.91 FALSE   <NA>
## 41            of1    NA 0.01  0.92 0.93 FALSE   <NA>
## 42 lazyLoadDBexec    NA 0.01  0.90 0.91 FALSE   base
## 43     WriteLines    NA 0.01  0.92 0.93 FALSE   <NA>
## 44            fun    NA 0.01  0.90 0.91 FALSE   <NA>
## 45     writeLines    NA 0.01  0.92 0.93 FALSE   base
## 46          fetch    NA 0.01  0.90 0.91  TRUE   <NA>
## 47         paste0    NA 0.01  0.92 0.93 FALSE   base
## 48          remap    NA 0.01  0.92 0.93 FALSE   <NA>
## 49           psub    NA 0.01  0.92 0.93  TRUE   <NA>

The results, are however not the same as from plain R

like image 773
mnel Avatar asked Sep 07 '12 00:09

mnel


People also ask

How do I add code to R Markdown?

You can insert an R code chunk either using the RStudio toolbar (the Insert button) or the keyboard shortcut Ctrl + Alt + I ( Cmd + Option + I on macOS).

How does R knitr work?

When you run render , R Markdown feeds the . Rmd file to knitr, which executes all of the code chunks and creates a new markdown (. md) document which includes the code and its output. The markdown file generated by knitr is then processed by pandoc which is responsible for creating the finished format.

What is code profiling R?

Profiling R code gives you the chance to identify bottlenecks and pieces of code that needs to be more efficiently implemented [1]. Profiling R code is usually the last thing I do in the process of package (or function) development.

What is a knitr document?

knitr is an R package that integrates computing and reporting. By incorporating code into text documents, the analysis, results and discussion are all in one place.


1 Answers

How about something along the lines of having a profrk-setup chunk like

library(profr)

profrk <- function(expr, delay = 1, interval = 0.02, quiet) {
    profrk_marker <- function(expr) {
        eval(expr)
    }

    filter <- function(prof.out) {
        level.filter <- prof.out$level[which(prof.out == "profrk_marker")] + 
            2

        prof.out <- prof.out[prof.out$level > level.filter, ]
        time.offset <- min(prof.out$start)

        prof.out$start <- prof.out$start - time.offset
        prof.out$end <- prof.out$end - time.offset

        prof.out
    }

    filter(profr({
        Sys.sleep(delay)
        profrk_marker(substitute(expr))
    }, interval = interval, quiet = quiet))
}

and a profiling chunk like

pr <- profrk(example(quantile, setRNG = TRUE), delay = 1, interval = 0.01, quiet = TRUE)
print(pr)

##                   f level time start  end  leaf source
## 39          example    31 0.08  0.00 0.08 FALSE  utils
## 40     index.search    32 0.01  0.00 0.01 FALSE   <NA>
## 41      <Anonymous>    32 0.05  0.01 0.06 FALSE   <NA>
## 42           source    32 0.02  0.06 0.08 FALSE   base
## 43          readRDS    33 0.01  0.00 0.01 FALSE   base
## 44       prepare_Rd    33 0.02  0.01 0.03 FALSE   <NA>
## 45           RdTags    33 0.01  0.03 0.04 FALSE   <NA>
## 46              of0    33 0.01  0.04 0.05 FALSE   <NA>
## 47           render    33 0.01  0.05 0.06 FALSE   <NA>
## 48      srcfilecopy    33 0.01  0.06 0.07 FALSE   base
## 49      withVisible    33 0.01  0.07 0.08 FALSE   base
## 50           gzfile    34 0.01  0.00 0.01  TRUE   base
## 51     .getHelpFile    34 0.02  0.01 0.03 FALSE   <NA>
## 52           sapply    34 0.01  0.03 0.04 FALSE   base
## 53              of1    34 0.01  0.04 0.05 FALSE   <NA>
## 54              of0    34 0.01  0.05 0.06 FALSE   <NA>
## 55        file.info    34 0.01  0.06 0.07 FALSE   base
## 56             eval    34 0.01  0.07 0.08 FALSE   base
## 57      <Anonymous>    35 0.02  0.01 0.03 FALSE   <NA>
## 58   simplify2array    35 0.01  0.03 0.04 FALSE   base
## 59       WriteLines    35 0.01  0.04 0.05 FALSE   <NA>
## 60              of1    35 0.01  0.05 0.06 FALSE   <NA>
## 61         .POSIXct    35 0.01  0.06 0.07 FALSE   base
## 62             eval    35 0.01  0.07 0.08 FALSE   base
## 63   lazyLoadDBexec    36 0.02  0.01 0.03 FALSE   base
## 64           unique    36 0.01  0.03 0.04  TRUE   base
## 65       writeLines    36 0.01  0.04 0.05 FALSE   base
## 66       WriteLines    36 0.01  0.05 0.06 FALSE   <NA>
## 67        structure    36 0.01  0.06 0.07  TRUE   base
## 68         quantile    36 0.01  0.07 0.08 FALSE  stats
## 69          readRDS    37 0.01  0.01 0.02  TRUE   base
## 70              fun    37 0.01  0.02 0.03 FALSE   <NA>
## 71           paste0    37 0.01  0.04 0.05 FALSE   base
## 72       writeLines    37 0.01  0.05 0.06 FALSE   base
## 73 quantile.default    37 0.01  0.07 0.08 FALSE  stats
## 74            fetch    38 0.01  0.02 0.03 FALSE   <NA>
## 75               wr    38 0.01  0.04 0.05 FALSE   <NA>
## 76           paste0    38 0.01  0.05 0.06 FALSE   base
## 77            paste    38 0.01  0.07 0.08 FALSE   base
## 78      <Anonymous>    39 0.01  0.02 0.03 FALSE   <NA>
## 79           paste0    39 0.01  0.04 0.05 FALSE   base
## 80            remap    39 0.01  0.05 0.06 FALSE   <NA>
## 81          formatC    39 0.01  0.07 0.08 FALSE   base
## 82     getFromFrame    40 0.01  0.02 0.03  TRUE   <NA>
## 83          strwrap    40 0.01  0.04 0.05 FALSE   base
## 84             psub    40 0.01  0.05 0.06  TRUE   <NA>
## 85      blank.chars    40 0.01  0.07 0.08 FALSE   <NA>
## 86           lapply    41 0.01  0.04 0.05 FALSE   base
## 87           vapply    41 0.01  0.07 0.08 FALSE   base
## 88         strsplit    42 0.01  0.04 0.05  TRUE   base
## 89              FUN    42 0.01  0.07 0.08 FALSE   <NA>
## 90            paste    43 0.01  0.07 0.08  TRUE   base

and we might as well look at the plot which I wish was interactive and fit the names better like the profr exploring feature.

plot(pr)

plot of chunk unnamed-chunk-1

( Try comparing the shape at various delays and intervals to your console capture. )

Nicer example output on rpubs.

like image 141
Thell Avatar answered Oct 02 '22 22:10

Thell