I am working on a project where I have a lot of analysts creating statistical models in R. They usually provide me with the model objects (.Rdata files) and I automate executing them for various datasets.
My problem is:
Can I use a database and save these .RData files there ? Any hints on doing this? ( I currently store the .Rdata files to disk and use a database to store location information)
I get a lot of R scripts from other analysts who have done some pre-processing of data before they create the models. Does anyone have experience using PMML to make this process repeatable without manual intervention ? PMML stores the pre-processing steps, modeling steps as markup tags, and would repeat the same on a new dataset.
Thank you for the suggestions and feedback.
-Harsh
The easiest way to load the data into R is to double-click on the particular file yourfile. RData after you download it to your computer. This will open in RStudio only if you have associated the . RData files with RStudio.
The RData format (usually with extension . rdata or . rda) is a format designed for use with R, a system for statistical computation and related graphics, for storing a complete R workspace or selected "objects" from a workspace in a form that can be loaded back by R.
RData files . . RData files are specific to R and can store as many objects as you'd like within a single file. Think about that. If you are conducting an analysis with 10 different dataframes and 5 hypothesis tests, you can save all of those objects in a single file called ExperimentResults.
To save data as an RData object, use the save function. To save data as a RDS object, use the saveRDS function. In each case, the first argument should be the name of the R object you wish to save. You should then include a file argument that has the file name or file path you want to save the data set to.
Yes, this is possible using eg MySQL linked to R with the RMySQL
and DBI
package, or via the RODBC
or RJDBC
package. I'm not 100% sure if they all support blobs, but worst case scenario you could use the ascii representation and put them in a text field.
The trick is using the function serialize()
> x <- rnorm(100)
> y <- 5*x+4+rnorm(100,0,0.3)
> tt <- lm(y~x)
> obj <- serialize(tt,NULL,ascii=T)
Now you can store or retrieve obj in a database. It's actually no more than a vector of ascii (or binary) codes. ascii=F gives you a binary representation. After retrieving it, you use :
> unserialize(obj)
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
4.033 4.992
Edit : regarding the pmml, there's a pmml
package on CRAN. Maybe that one gets you somewhere?
R can serialize and deserialize any object, that is how my digest package creates so-called 'hash digests' by running a hash function over the serialized object.
So once you have the serialized object (which can be serialized to character
), store it. Any relational database will support this, as will the NoSQL key/value stores -- and for either backend you could even use the 'hash digest' as a key, or some other meta-information.
Other alternatives are for example RProtoBuf which can also serialize and de-serialize very efficiently (but you'd have to write the .proto files first).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With