Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encrypting R script under MS-Windows

I have a bunch of R scripts which I am running on a Windows machine and want to ensure that the code remains unread by those not intended to see it. On a Linux box, I could wrap the R code in a bash script #! and make an encrypted (and perhaps even a limited-life) executable shell script. What are my options to do something on similar lines under Windows?

like image 525
Vishal Belsare Avatar asked Jan 16 '11 18:01

Vishal Belsare


People also ask

How do you protect a code in R?

At some point the R code has to be processed by the R interpreter. If you give someone encrypted code, you have to give them the decryption key so R can run it. Maybe you can hide the key somewhere and hope they don't find it. But they must have access to it in order to generate the plain text R code somehow.


3 Answers

My answer is a bit late, but I believe this is a good question. Unfortunately, I don't believe that there is a solution, or at least an easy one, at the present time.

The difficulty is common because, for most interpreted languages, including R, it is often possible to turn on logging and inspection of all commands being run. This can negate many tricks to obfuscate the code.

For those who prefer to think of code being open == good, one should know that a common reason to obfuscate the code is if one is consulting with a client that hires multiple vendors. It is not uncommon for a client to take scripts from vendor A and ask vendor B why it doesn't work with their system. (This may be done by a low-level IT flunkie, rather than someone responsible for the NDA contracts.) If A & B are competitors, A's code has just been handed to B. When scripts == serious programs, then serious code has been given away.

The ways I've seen this addressed are:

  1. Make a call to a compiled language, and use standard protections available there.
  2. Host the executable on a different server, and use calls to the server to execute the calculations. (In R, there are multiple server-side options.)
  3. Use compiled (preprocessed / bytecode) code within the language.

Option 2 is actually easier and better when the code may be widely distributed, not just for IP reasons. A major advantage is that it lets you upgrade the code without having to go through the pain of a site-wide release process. If new libraries are needed, no problem - update the server.

Option 3 is done in Matlab with .p files, and can be done with py2exe for Python on Windows. In R, the new bytecode compilation may be analogous, but I am not familiar enough with it to address any differences between .Rc files in the R context and .p files in the Matlab context. For more info on the compiler, see: http://www.inside-r.org/r-doc/compiler/compile

Hosting computations on the server is great for working with unsophisticated users, because it is easier to iterate quickly in response to bugs or feature requests. The IP protection is simply a benefit.

like image 120
Iterator Avatar answered Oct 08 '22 08:10

Iterator


This is not a specifically R-oriented strategy. (And it's a bit unclear what your constraints or goals really are anyway.) If you want a cross-platform encryption method, you should look into the open-source program TrueCrypt. It supports creating encrypted files that can be mounted as volumes on any machine that supports the volume formatting method. I have tested this across the Mac PC divide , since the Mac can read FAT files, but have no experience with how it might work across the Linux-PC chasm.

(Their TODO list for Windows includes;"Command line options for volume creation (already implemented in Linux and Mac OS X versions)". So I don't see any clear way to use this from within R without you running the program from the OS.)

like image 5
IRTFM Avatar answered Oct 08 '22 09:10

IRTFM


I don't think this is possible because the R interpreter has to be able to decrypt and read the code in order to execute it which means that whoever is using that interpreter will also be able to decrypt and read the code.

I am by no means an expert, so I reserve the right to be 100% wrong about that statement.

I believe the best solution is to ensure value comes from the expertise and services provided by your company and it's employers---not from keeping secrets.

Failing that, you could try separating the code into a client/server model. That way the client just sends data and receives results---they never have access to the code that runs on the server.

However, the scientist in me just said "that solution sucks and I would never trust results provided under such conditions".

like image 1
Sharpie Avatar answered Oct 08 '22 07:10

Sharpie