Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Store large dataframes in redis through R

Tags:

dataframe

r

redis

I have a number of large dataframes in R which I was planning to store using redis. I am totally new to redis but have been reading about it today and have been using the R package rredis.

I have been playing around with small data and saved and retrieved small dataframes using the redisSet() and redisGet() functions. However when it came to saving my larger dataframes (the largest of which is 4.3 million rows and 365MB when saved as .RData file) using the code redisSet('bigDF', bigDF) I get the following error message:

Error in doTryCatch(return(expr), name, parentenv, handler) : 
  ERR Protocol error: invalid bulk length
In addition: Warning messages:
1: In writeBin(v, con) : problem writing to connection
2: In writeBin(.raw("\r\n"), con) : problem writing to connection

Presumably because the dataframe is too large to save. I know that redisSet writes the dataframe as a string, which is perhaps not the best way to do it with large dataframes. Does anyone know of the best way to do this?

EDIT: I have recreated the error my creating a very large dummy dataframe:

bigDF <- data.frame(
'lots' = rep('lots',40000000),
'of' = rep('of',40000000),
'data' = rep('data',40000000),
'here'=rep('here',40000000)
)

Running redisSet('bigDF',bigDF) gives me the error:

 Error in .redisError("Invalid agrument") : Invalid agrument

the first time, then running it again immediately afterwards I get the error

Error in doTryCatch(return(expr), name, parentenv, handler) : 
  ERR Protocol error: invalid bulk length
In addition: Warning messages:
1: In writeBin(v, con) : problem writing to connection
2: In writeBin(.raw("\r\n"), con) : problem writing to connection

Thanks

like image 579
user1165199 Avatar asked Apr 19 '13 15:04

user1165199


1 Answers

In short: you cannot. Redis can store a maximum of 512 Mb of data in a String value and your serialized demo data frame is bigger than that:

> length(serialize(bigDF, connection = NULL)) / 1024 / 1024
[1] 610.352

Technical background:

serialize is called in the .cerealize function of the package via redisSet and rredis:::.redisCmd:

> rredis:::.cerealize
function (value) 
{
    if (!is.raw(value)) 
        serialize(value, ascii = FALSE, connection = NULL)
    else value
}
<environment: namespace:rredis>

Offtopic: why would you store such a big dataset in redis anyway? Redis is for small key-value pairs. On the other hand I had some success storing big R datasets in CouchDB and MongoDB (with GridFS) by adding the compressed RData there as an attachement.

like image 80
daroczig Avatar answered Oct 05 '22 00:10

daroczig