Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing R Objects in a relational database

I frequently create nonparametric statistics (loess, kernel densities, etc) on data I pull out of a relational database. To make data management easier I would like to store R output back inside my DB. This is easy with simple data frames of numbers or text, but I have not figured out how to store R objects back in my relational database. So is there a way to store a vector of kernel densities, for example, back into a relational database?

Right now I work around this by saving the R objects to a network drive space so others can load the objects as needed.

like image 380
JD Long Avatar asked Sep 08 '09 17:09

JD Long


People also ask

How objects are stored in relational database?

You can store objects in relational tables as column objects or in object tables as row objects. Those objects that have meaning outside of the relational database they reside in, should be made referenceable as row objects in an object table. Otherwise, they should be stored as column objects in a relational table.

Can you connect R to a database?

R can connect to almost any existing database type. Most common database types have R packages that allow you to connect to them (e.g., RSQLite , RMySQL, etc).

Is Rstudio a relational database?

This is also called a relational database since the objects stored in it can be linked or related to each other. You can download data and code for this module as an Rstudio project here.


2 Answers

An example R variable, that's fairly complex:

library(nlme)
model <- lme(uptake ~ conc + Treatment, CO2, random = ~ 1 | Plant / Type)

The best storage database method for R variables depends upon how you want to use it.

I need to do in-database analytics on the values

In this case, you need to break the object down into values that the database can handle natively. This usually means converting it into one or more data frames. The easiest way to do this is to use the broom package.

library(broom)
coefficients_etc <- tidy(model)
model_level_stats <- glance(model)
row_level_stats <- augment(model)

I just want storage

In this case you want to serialize your R variables. That is, converting them to be a string or a binary blob. There are several methods for this.


My data has to be accessible by programs other than R, and needs to be human-readable

You should store your data in a cross-platform text format; probably JSON or YAML. JSON doesn't support some important concepts like Inf; YAML is more general but the support in R isn't as mature. XML is also possible, but is too verbose to be useful for storing large arrays.

library(RJSONIO)
model_as_json <- toJSON(model)
nchar(model_as_json) # 17916

library(yaml)
# yaml package doesn't yet support conversion of language objects,
# so preprocessing is needed
model2 <- within(
  model,
  {
     call <- as.character(call)
     terms <- as.character(terms)
  }
)
model_as_yaml <- as.yaml(model2) 
nchar(model_as_yaml) # 14493

My data has to be accessible by programs other than R, and doesn't need to be human-readable

You could write your data to an open, cross-platform binary format like HFD5. Currently support for HFD5 files (via rhdf5) is limited, so complex objects are not supported. (You'll probably need to unclass everything.)

library(rhdf5)
h5save(rapply(model2, unclass, how = "replace"), file = "model.h5")
bin_h5 <- readBin("model.h5", "raw", 1e6)
length(bin_h5) # 88291 not very efficient in this case

The feather package let's you save data frames in a format readable by both R and Python. To use this, you would first have to convert the model object into data frames, as described in the broom section earlier in the answer.

library(feather)
library(broom)
write_feather(augment(model), "co2_row.feather")  # 5474 bytes
write_feather(tidy(model), "co2_coeff.feather")   # 2093 bytes
write_feather(glance(model), "co2_model.feather") #  562 bytes

Another alternative is to save a text version of the variable (see previous section) to a zipped file and store its bytes in the database.

writeLines(model_as_json)
tar("model.tar.bz", "model.txt", compression = "bzip2")
bin_bzip <- readBin("model.tar.bz", "raw", 1e6)
length(bin_bzip) # only 42 bytes!

My data only needs to be accessible by R, and needs to be human-readable

There are two options for turning a variable into a string: serialize and deparse.

p <- function(x)
{
  paste0(x, collapse = "\n")
}

serialize needs to be sent to a text connection, and rather than writing to file, you can write to the console and capture it.

 model_serialized <- p(capture.output(serialize(model, stdout())))
 nchar(model_serialized) # 23830

Use deparse with control = "all" to maximise the reversibility when re-parsing later.

model_deparsed <- p(deparse(model, control = "all"))
nchar(model_deparsed) # 22036

My data only needs to be accessible by R, and doesn't need to be human-readable

The same sorts of techniques shown in the previous sections can be applied here. You can zip a serialized or deparsed variable and re-read it as a raw vector.

serialize can also write variables in a binary format. In this case, it is most easily used with its wrapper saveRDS.

saveRDS(model, "model.rds")
bin_rds <- readBin("model.rds", "raw", 1e6)
length(bin_rds) # 6350
like image 80
Richie Cotton Avatar answered Oct 28 '22 07:10

Richie Cotton


Use the serialization feature to turn any R object into a (raw or character) string, then store that string. See help(serialize).

Reverse this for retrieval: get the string, then unserialize() into a R object.

like image 44
Dirk Eddelbuettel Avatar answered Oct 28 '22 07:10

Dirk Eddelbuettel