Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Serializing .RData file to database

Tags:

database

r

rdata

I am working on a project where I have a lot of analysts creating statistical models in R. They usually provide me with the model objects (.Rdata files) and I automate executing them for various datasets.

My problem is:

  • Can I use a database and save these .RData files there ? Any hints on doing this? ( I currently store the .Rdata files to disk and use a database to store location information)

  • I get a lot of R scripts from other analysts who have done some pre-processing of data before they create the models. Does anyone have experience using PMML to make this process repeatable without manual intervention ? PMML stores the pre-processing steps, modeling steps as markup tags, and would repeat the same on a new dataset.

Thank you for the suggestions and feedback.

-Harsh

like image 725
harshsinghal Avatar asked Oct 17 '10 20:10

harshsinghal


People also ask

How do I import a .RData file?

The easiest way to load the data into R is to double-click on the particular file yourfile. RData after you download it to your computer. This will open in RStudio only if you have associated the . RData files with RStudio.

What does the .RData file store?

The RData format (usually with extension . rdata or . rda) is a format designed for use with R, a system for statistical computation and related graphics, for storing a complete R workspace or selected "objects" from a workspace in a form that can be loaded back by R.

What is the .RData file?

RData files . . RData files are specific to R and can store as many objects as you'd like within a single file. Think about that. If you are conducting an analysis with 10 different dataframes and 5 hypothesis tests, you can save all of those objects in a single file called ExperimentResults.

How do I save an object as a RData?

To save data as an RData object, use the save function. To save data as a RDS object, use the saveRDS function. In each case, the first argument should be the name of the R object you wish to save. You should then include a file argument that has the file name or file path you want to save the data set to.


2 Answers

Yes, this is possible using eg MySQL linked to R with the RMySQL and DBI package, or via the RODBC or RJDBC package. I'm not 100% sure if they all support blobs, but worst case scenario you could use the ascii representation and put them in a text field.

The trick is using the function serialize()

> x <- rnorm(100)
> y <- 5*x+4+rnorm(100,0,0.3)
> tt <- lm(y~x)
> obj <- serialize(tt,NULL,ascii=T)

Now you can store or retrieve obj in a database. It's actually no more than a vector of ascii (or binary) codes. ascii=F gives you a binary representation. After retrieving it, you use :

> unserialize(obj)
Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
      4.033        4.992  

Edit : regarding the pmml, there's a pmml package on CRAN. Maybe that one gets you somewhere?

like image 88
Joris Meys Avatar answered Nov 02 '22 12:11

Joris Meys


R can serialize and deserialize any object, that is how my digest package creates so-called 'hash digests' by running a hash function over the serialized object.

So once you have the serialized object (which can be serialized to character), store it. Any relational database will support this, as will the NoSQL key/value stores -- and for either backend you could even use the 'hash digest' as a key, or some other meta-information.

Other alternatives are for example RProtoBuf which can also serialize and de-serialize very efficiently (but you'd have to write the .proto files first).

like image 2
Dirk Eddelbuettel Avatar answered Nov 02 '22 11:11

Dirk Eddelbuettel