Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The Role of Data in openCPU

Tags:

r

opencpu

I am well aware of the fact, that this might not be the typical SO question, but since this is the strongest R programming community I know and the author of opencpu explicitly encourages to post here, I'll give it a try:

What role does data play in the opencpu approach? I mean cloud computing is nice, but you need some data to calculate. Uploading some example .csv or .xls table might be straight forward, but what does opencpu have in mind for real world data?

What about several hundred MBs (or even GBs) of data? How would you a) transfer it to your user folder? How would you b) share it among a group of authenticated users and c) hide it from the public?

I read the license part and from what I understand for safety it should be possible to run the calculations behind the scene as long as the source code is publicly available. But still, the little document leaves open questions and lot of guessing.

like image 665
Matt Bannert Avatar asked Jan 23 '26 10:01

Matt Bannert


1 Answers

Thanks for trying OpenCPU. OpenCPU is still an evolving project at this point, so we are open to interesting suggestions or use cases.

About the data... you are asking many things at once. Some thoughts:

  • At this point, OpenCPU does not solve the 'big data' problem. It does not scale beyond what R itself scales to. It is mostly meant as an infrastructure for small to medium sized data; e.g. a typical research paper, project, etc.
  • OpenCPU is an API. It is not limited to browser clients. It is designed to be called from other clients as well.
  • OpenCPU has a store that you use to store R objects on the server. E.g you upload a CSV or whatever once, and then you store the actual dataframe. In any subsequent calls you can then include this object as an argument to function calls.
  • Another approach would be to combine it with a external database (e.g. mysql) and dynamically pull the data in your R code (e.g. using RMySQL)
  • Afaik, the legal aspects of open data are not completely clear at this point. I don't think there is consensus on how copyright applies to data, and what a good license would be. However, a key feature in the design of OpenCPU is making sure things are easily reproducible. This can of course only be done when the data is actually public.
like image 66
Jeroen Ooms Avatar answered Jan 25 '26 23:01

Jeroen Ooms



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!