Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When running PL/R on PostgreSQL, can R handle data bigger then RAM?

When I use R open source, if not using a specific package, it's not possible handle data sets bigger than RAM memory. So I would like to know if it's possible handle big data sets applying PL/R functions inside PostgreSQL.

I didn't found any documentation about this.

like image 796
Flavio Barros Avatar asked May 17 '13 15:05

Flavio Barros


People also ask

How much data can PostgreSQL handle?

There is no PostgreSQL-imposed limit on the number of indexes you can create on a table. Of course, performance may degrade if you choose to create more and more indexes on a table with more and more columns. PostgreSQL has a limit of 1GB for the size of any one field in a table.

How much RAM is needed for PostgreSQL?

Memory. The 2GB of memory is a recommendation for memory you can allocate to PostgreSQL outside of the operating system. If you have a small data set, you are still going to want enough memory to cache the majority of your hot data (you can use pg_buffercache to determine your hot data).

Why use Redis with Postgres?

A Redis in-memory database cache allows you to access frequently used data from your server's RAM. This minimizes unnecessarily load and roundtrips to your PostgreSQL database server.

What is shared_buffers PostgreSQL?

shared_buffers (integer) Sets the amount of memory the database server uses for shared memory buffers. The default is typically 32 megabytes (32MB), but might be less if your kernel settings will not support it (as determined during initdb). This setting must be at least 128 kilobytes.


1 Answers

As mentioned by Hong Ooi, PL/R loads an R interpreter into the PostgreSQL backend process. So your R code is running "in database".

There is no universal way to deal with memory limitations, but there are least two possible options:

  1. define a custom PostgreSQL aggregate, and use your PL/R function as the "final" function. In this way you are processing in groups, and thus less likely to have problems with memory. See the online PostgreSQL documentation and PL/R documentation for more detail (I don't post to stackoverflow often, so unfortunately it will not allow me to post the actual URLs for you)
  2. Use the pg.spi.cursor_open and pg.spi.cursor_fetch functions installed by PL/R into the R interpreter in order to page data into your R function in chunks.

See PL/R docs here: http://www.joeconway.com/plr/doc/index.html

I am guessing what you would really like to have is a data.frame in which the data is paged to and from an underlying database cursor transparently to your R code. This is on my long term TODO, but unfortunately I have not been able to find the time to work it out. I have been told that Oracle's R connector has this feature, so it seems it can be done. Patches welcomed ;-)

like image 157
Joseph Conway Avatar answered Oct 04 '22 02:10

Joseph Conway