Is there a reason I can't read a RDS file from within a zip file directly, without having to unzip it to a temp file on disk first?
Let's say this is the zip file:
saveRDS(cars, "cars.rds")
saveRDS(iris, "iris.rds")
write.csv(iris, "iris.csv")
zip("datasets.zip", c("cars.rds", "iris.rds", "iris.csv"))
file.remove("cars.rds", "iris.rds", "iris.csv")
For the csv file, I could read it directly like this:
iris2 <- read.csv(unz("datasets.zip", "iris.csv"))
However, I don't understand why I can't use unz()
directly with readRDS()
:
iris3 <- readRDS(unz("datasets.zip", "iris.rds"))
This gives me the error:
Error: unknown input format
I'd also like to understand why this happens. I'm aware that I could do the following, as in this question:
path <- unzip("datasets.zip", "iris.rds")
iris4 <- readRDS(path)
file.remove(path)
This doesn't seem as efficient, though, and I need to do it frequently for a really large number of files, so I/O inefficiencies matter. Is there any workaround to read the rds file without extracting it to disk?
Lucky for you, the unzip command has the -l option that displays the contents of a zip file without extracting them. To view a ZIP file's contents, run the unzip command to list ( -l ) the zip file's ( newdir. zip ) contents without extracting them.
When you have a single file in the zip archive, you can use one of the following commands to read them: zcat, zless and zmore. These commands will not work if the zip archive contains more than one file. Use the zcat command to read the contents of the . zip file.
To unzip filesOpen File Explorer and find the zipped folder. To unzip the entire folder, right-click to select Extract All, and then follow the instructions. To unzip a single file or folder, double-click the zipped folder to open it. Then, drag or copy the item from the zipped folder to a new location.
This was a little tricky to track down until I read the body of readRDS()
. What it seems you need to do is
.zip
archive and the file inside it with unz()
gzcon()
readRDS()
.Here's an example to illustrate using the following serialised matrix mat
inside a zip archive matrix.zip
mat <- matrix(1:9, ncol = 3)
saveRDS(mat, "matrix.rds")
zip("matrix.zip", "matrix.rds")
Open a connection to matrix.zip
con <- unz("matrix.zip", filename = "matrix.rds")
Now, using gzcon()
, apply GZIP decompression to this connection
con2 <- gzcon(con)
Finally, read from the connection
mat2 <- readRDS(con2)
In full we have
con <- unz("matrix.zip", filename = "matrix.rds")
con2 <- gzcon(con)
mat2 <- readRDS(con2)
close(con2)
This gives
> con <- unz("matrix.zip", filename = "matrix.rds")
> con2 <- gzcon(con)
> mat2 <- readRDS(con2)
> close(con2)
> mat2
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> all.equal(mat, mat2)
[1] TRUE
Why you have to go through this convoluted extra step is (I think) described in ?readRDS
:
Compression is handled by the connection opened when
file
is a file name, so is only possible whenfile
is a connection if handled by the connection. So e.g.url
connections will need to be wrapped in a call togzcon
.
And if you look at the internals of readRDS()
we see:
> readRDS
function (file, refhook = NULL)
{
if (is.character(file)) {
con <- gzfile(file, "rb")
on.exit(close(con))
}
else if (inherits(file, "connection"))
con <- file
else stop("bad 'file' argument")
.Internal(unserializeFromConn(con, refhook))
}
<bytecode: 0x2841998>
<environment: namespace:base>
If file
is a character string for the file name, the object is decompressed using gzile()
to create the connection to the .rds
we want to read. Notice that if you pass a connection as file
, as you want to do, at no point has R decompressed the connection. file
is just assigned to con
and then passed to the internal function unserializeFromConn
. Hence wrapping gzcon()
around the connection created by unz
works.
Basically, when unserializeFromConn
reads from a connection it expects it to be decompressed but that decompression only happen automagically when you pass readRDS()
a filename, not a connection.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With