Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Debugging (line by line) of Rcpp-generated DLL under Windows

Tags:

Recently I've been experimenting with Rcpp (inline) to generate DLLs that perform various tasks on supplied R inputs. I'd like to be able to debug the code in these DLLs line by line, given a specific set of R inputs. (I'm working under Windows.)

To illustrate, let's consider a specific example that anybody should be able to run...

The code below is a really simple cxxfunction which simply doubles the input vector. Note however that there's an additional variable myvar that changes value a few times but doesn't affect the output - this has been added so that we'll be able to see when the debugging process is running correctly.

library(inline)
library(Rcpp)

f0 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
    Rcpp::NumericVector xa(a);
    int myvar = 19;
    int na = xa.size();
    myvar = 27;
    Rcpp::NumericVector out1(na);
    for(int i=0; i < na; i++) {
        out1[i] = 2*xa[i];
        myvar++;
    }
    myvar = 101;
    return(Rcpp::List::create( _["out1"] = out1));
')

After we run the above, typing the command

getLoadedDLLs()

brings up a list of DLLs in the R session. The last one listed should be the DLL created by the above process - it has a random temporary name, which in my case is

file7e61645c

The "Filename" column shows that cxxfunction has put this DLL in the location tempdir(), which for me is currently

C:/Users/TimP/AppData/Local/Temp/RtmpXuxtpa/file7e61645c.dll

Now, the obvious way to call the DLL is via f0, as follows

> f0(c(-7,0.7,77))

$out1
[1] -14.0   1.4 154.0

But we can of course also call the DLL directly by name using the .Call command:

> .Call("file7e61645c",c(-7,0.7,77))

$out1
[1] -14.0   1.4 154.0

So I've reached the point where I'm calling a standalone DLL directly with R input (here, the vector c(-7,0.7,77)), and having it return the answer correctly to R.

What I really need, though, is a facility for line-by-line debugging (using gdb, I presume) that will allow me to observe the value of myvar being set to 19, 27, 28, 29, 30, and finally 101 as the code progresses. The example above is deliberately set up so that calling the DLL tells us nothing about myvar.

To clarify, the "win condition" here is being able to observe myvar changing (seeing the value myvar=19 would be the first step!) without adding anything else to the body of the code. This obviously may require changes to the way in which the code is compiled (are there debugging mode settings to turn on?), or the way R is called - but I don't know where to begin. As noted above, all of this is Windows-based.

Final note: In my experiments, I actually made some minor modifications to a copy of cxxfunction so that the output DLL - and the code within it - receives a user-defined name and sits in a user-defined directory, rather than a temporary name and location. But this doesn't affect the essence of the question. I mention this just to emphasise that it should be fairly easy to alter the compilation settings if someone gives me a nudge :)

For completeness, setting verbose=TRUE in the original cxxfunction call above shows the compilation argument to be of the following form:

C:/R/R-2.13.2/bin/i386/R CMD SHLIB file7e61645c.cpp 2> file7e61645c.cpp.err.txt 
g++ -I"C:/R/R-213~1.2/include"    -I"C:/R/R-2.13.2/library/Rcpp/include"      -O2 -Wall  -c file7e61645c.cpp -o file7e61645c.o
g++ -shared -s -static-libgcc -o file7e61645c.dll tmp.def file7e61645c.o C:/R/R-2.13.2/library/Rcpp/lib/i386/libRcpp.a -LC:/R/R-213~1.2/bin/i386 -lR

My adapted version has a compilation argument identical to the above, except that the string "file7e61645c" is replaced everywhere by the user's choice of name (e.g. "testdll") and the relevant files copied over to a more permanent location.

Thanks in advance for your help guys :)

like image 678
Tim P Avatar asked Jul 05 '12 13:07

Tim P


1 Answers

I am a little stunned by the obsession some Rcpp users have with the inline package and its cxxfunction(). Yes, it is indeed very helpful and it has surely has driven adoption of Rcpp further as it makes quick experimentation so much easier. Yes, it allowed us to use 700+ unit tests in the sources. Yes, I use it all the time to demonstrate examples here, on the rcpp-devel list or even live in presentations.

But does that mean we should use it for each and every task? Does it mean that it does not have "costs" such as randomized filenames in a temporary directory etc pp? Romain and I argued otherwise in our documentation.

Lastly, debugging of dynamically loaded R modules is difficult as it stands. There is an entire section in the (mandatory) Writing R Extensions about it, and Doug Bates once or twice posted a tutorial about how to do this via ESS and Emacs (though I always forget where he posted it; once was IIRC on the rcpp-devel list).

Edit 2012-Jul-07:

Here is your step by step:

  • (Preamble: I've used gcc and g++ for many years, and even when I add -g I don't always turn -O2 into -O0. I am really not sure you need that, but as you ask for it...)

  • Set your environment variable CXXFLAGS to "-g -O0 -Wall". There numerous ways to do it, some are platform-dependent (eg Windows control panel) and therefore less universal and interesting. I use ~/.R/Makevars on Windows and Unix. You could use that, or you could override R's system-wide $RHOME/etc/Makeconf or you could use Makeconf.site or ... See the full docs---but as I said, ~/.R/Makevars is my preferred way as it does NOT interfere with compilation outside of R.

  • Now every compilation R does via R CMD SHLIB, R CMD COMPILE, R CMD INSTALL, ... will use. So it no longer matters you use inline or a local package. Continuing with inline...

  • For the rest, we mostly follow 'Section 4.4.1 Finding entry points in dynamically loaded code' of "Writing R Extensions":

  • Start another R session with R -d gdb.

  • Compile your code. For

fun <- cxxfunction(signature(), plugin="Rcpp", verbose=TRUE, body='
   int theAnswer = 42;
   return wrap(theAnswer);
')

I get

[...]
Compilation argument:
 /usr/lib/R/bin/R CMD SHLIB file11673f928501.cpp 2> file11673f928501.cpp.err.txt 
 ccache g++-4.6 -I/usr/share/R/include -DNDEBUG   -I"/usr/local/lib/R/site- library/Rcpp/include"   -fpic  -g -O0 -Wall -c file11673f928501.cpp -o file11673f928501.o
g++-4.6 -shared -o file11673f928501.so file11673f928501.o -L/usr/local/lib/R/site-library/Rcpp/lib -lRcpp -Wl,-rpath,/usr/local/lib/R/site-library/Rcpp/lib -L/usr/lib/R/lib -lR
  • Invoke eg tempdir() to see the temporary directory, cd to this temporary directory used above and dyn.load() the file built above:
 dyn.load("file11673f928501.so")
  • Now suspend R by sending a break signal (in Emacs, a simple choice from a drop-down).

  • In gdb, set a breakpoint. The single assignment above became line 32 for me, so

break file11673f928501.cpp 32
cont
  • Back in R, call the function:

    fun()

  • Presto, in the debugger at the break point we wanted:

R> fun()

Breakpoint 1, file11673f928501 () at file11673f928501.cpp:32
32      int theAnswer = 42;
(gdb) 
  • Now it is "just" up to you to work gdb to its magic

Now, as I said in my first attempt, all this would be easier (in my eyes) via a simple package which Rcpp.package.skeleton() can write for you as you don't have to deal with randomized directories and filenames. But each to their own...

like image 70
Dirk Eddelbuettel Avatar answered Sep 28 '22 07:09

Dirk Eddelbuettel