I have a number of large dataframes in R which I was planning to store using redis. I am totally new to redis but have been reading about it today and have been using the R package rredis
.
I have been playing around with small data and saved and retrieved small dataframes using the redisSet()
and redisGet()
functions. However when it came to saving my larger dataframes (the largest of which is 4.3 million rows and 365MB when saved as .RData file)
using the code redisSet('bigDF', bigDF)
I get the following error message:
Error in doTryCatch(return(expr), name, parentenv, handler) :
ERR Protocol error: invalid bulk length
In addition: Warning messages:
1: In writeBin(v, con) : problem writing to connection
2: In writeBin(.raw("\r\n"), con) : problem writing to connection
Presumably because the dataframe is too large to save. I know that redisSet writes the dataframe as a string, which is perhaps not the best way to do it with large dataframes. Does anyone know of the best way to do this?
EDIT: I have recreated the error my creating a very large dummy dataframe:
bigDF <- data.frame(
'lots' = rep('lots',40000000),
'of' = rep('of',40000000),
'data' = rep('data',40000000),
'here'=rep('here',40000000)
)
Running redisSet('bigDF',bigDF)
gives me the error:
Error in .redisError("Invalid agrument") : Invalid agrument
the first time, then running it again immediately afterwards I get the error
Error in doTryCatch(return(expr), name, parentenv, handler) :
ERR Protocol error: invalid bulk length
In addition: Warning messages:
1: In writeBin(v, con) : problem writing to connection
2: In writeBin(.raw("\r\n"), con) : problem writing to connection
Thanks
In short: you cannot. Redis can store a maximum of 512 Mb of data in a String value and your serialized demo data frame is bigger than that:
> length(serialize(bigDF, connection = NULL)) / 1024 / 1024
[1] 610.352
Technical background:
serialize
is called in the .cerealize
function of the package via redisSet
and rredis:::.redisCmd
:
> rredis:::.cerealize
function (value)
{
if (!is.raw(value))
serialize(value, ascii = FALSE, connection = NULL)
else value
}
<environment: namespace:rredis>
Offtopic: why would you store such a big dataset in redis anyway? Redis is for small key-value pairs. On the other hand I had some success storing big R datasets in CouchDB and MongoDB (with GridFS) by adding the compressed RData
there as an attachement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With