Is unserializing not-trusted raw vector secure in R?

Question

R provides serialize and unserialize functions to convert an arbitrary R object into vector of bytes and back.

I am considering unserializing parts of information that will get transmitted via network, that could (in theory) be tampered with by a malicious user.

I understand, that the malicious user can inject an arbitrary R object that can be full of harmful code. But this is not what I am worrying about, because I can (I think I can) prevent such code from ever executing by careful handling of the received objects.

I worry about a buffer overflow or similar way of executing an arbitrary machine code on the R server by mere fact of unserializing malformed bytes. Has anyone seen/performed a fuzz testing on it?

I am not a security expert and I have no easy means to perform the analysis myself. But maybe someone else did?

hrbrmstr · Accepted Answer

To my knowledge there hasn't been an extensive secure code review of the R codebase. For those out there that think Adam's concern is not warranted Here's a list of Python vulnerabilities, a number of which (for even regular Python vs CPython) are indeed buffer overflows. R's list is scant and I don't think it's scant due to the base R code being perfect. (As an aside, there is even a singular vulnerability report for S-PLUS.) I point out Python since it's a widely used, C-backed interpreted environment (sound familiar?).

A quick search on R Core's Bugzilla for `"overflow'" shows a handful of issues all of which do not not seem to be vulnerability/exploit related.

I suspect the lack of CVE's or internal vuln reports for R is mostly due to no security researchers having spare cycles to dedicate to poking at it and no organization ever requesting one commercially (at least with the intent of making the issues public vs just fixing it in a private copy of the R code base which is a definite possibility).

I will posit that from my experience base R is far more likely to be application DoS'd from a situation like you are describing more than succumbing to a successful code execution or process privilege escalation attack. Base R has it's own memory management layer/interface on top of the core C routines and the R Core team is almost religious in it's use of valgrind. Now, valgrind isn't going to catch everything (if it did there would be no buffer overflows in any software) but the use of it here combined with said memory management give me some comfort in this regard.

Yet a very cursory search for something like strlen in the base R C code yields over 1,000 results (it also has it's non-n cousins in the base R code and these non n versions of basic string functions are the source of many buffer overflow vulnerabilities). So, is it possible, yes. What's the likelihood and severity to you? More on that in a bit.

Base R does incorporate and depend on other open source libraries and those do have exploitable vulnerabilities. R core (and the vendors like AT&T, Microsoft, TIBCO, etc.) binaries use (from my experience) the latest versions (when statically linked) of said open source libraries in their binary releases (which are not frequent, but do happen many times in a year). R core will also release new binaries outside of R-devel promotion windows if the situation is warranted (and the vendors usually follow suit quickly). Note that you're completely on your own if you decide to compile R yourself with what you have available on your system.

There are many C/C++-backed R packages that incorporate copies of open source libraries with them. One of my "spare time" projects (that I never have spare time for) is to track them and see just how out of date these included libraries are. httuv is one example where the includes source library in that package has vulnerabilities but they are unlikely to be exposed to R as the functionality in the vulnerable portion of the code is not something it directly exposes via the package API. You do need to be concerned about these packages, though.

Even without the reliance on incorporated library source code, C/C++-backed R packages are a definite source of potential memory issues since there are no secure coding requirements for them and they do manipulate memory. The R Core team highly recommends using valgrind to test packages (it's an option in R CMD check which is ++gd) but that also relies on the authors writing robust examples and tests which would explicitly exercise the code enough to test for this. Most package authors (including, sadly, me) rarely do that in my experience.

There is another potential source for package-related vulnerabilities if you are on Windows but I don't think you're on Windows and this answer is already long in the tooth.

So, back to your risk assessment (since that's really what you are doing). You need to take into account a few things. Are attackers targeting R right now. I work in cybersecurity at one of the best firms in the biz and haven't seen anything in the threat actor landscape that makes me believe there is a general targeting of R. Could your firm be targeted specifically? Yes. I have no idea what your firm does, but I suspect you're not important enough to be targeted by the level of attacker that would be needed to craft the attack you are positing. You really need to use something like OpenFAIR to model the scenarios to really grapple with what the risk level really is for you.

If you're running on a server as you say, and said server is linux/bsd-based there are numerous things outside of R that you can do to help limit the impact of a buffer overflow attack. Those are too long to go into here. But those compensating controls would be good to look into if you are as concerned about this whole thing as your question indicates.

The TLDR is that it sounds like you're on a unix-ish system and are transmitting serialized R objects over the wire in a non-isolated segment of your network. Without knowing more about your firm I would, in general, state that you should feel pretty safe doing what you are doing unless you are in an industry that has nation-state actors targeting you. I also think said threat actors have thousands of other ways to get after whatever it really is they'd be after w/o focusing on exploiting your particular process.

I'd take some comfort, though, in that Microsoft bought Revolution Analytics and is making R a first-class citizen. Microsoft has an extensive and industry leading set of secure coding standards and practices and I am confident it'll be applied to the R team (they did the integration and branding very quickly and I doubt the secure code process has been fully applied yet but they depend on R for their cloud analytics and know the price they'll pay for a widespread exposure). Whatever Microsoft ends up fixing will end up back in base R (provided the vulnerabilities they find aren't in the proprietary optimized portions of Microsoft R).

This is long and I may not have fully addressed your concerns so please ask for clarification in the comments and I can do my best to address them before considering this answer "done". Major kudos to you for being concerned about this, though. You do credit to yourself and your firm.

Is unserializing not-trusted raw vector secure in R?

Tags:

Adam Ryczkowski

1 Answers

hrbrmstr

Recent Activity

Donate For Us

Is unserializing not-trusted raw vector secure in R?

Tags:

Adam Ryczkowski

1 Answers

hrbrmstr

Related questions

Recent Activity

Donate For Us