Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R, GET and GZ compression

Tags:

rest

r

gzip

get

rcurl

I am building clients onto RESTful APIs. Some links let me download attachments (files) from the server, and in the best case these are .txt. I only mention the RESTful part since it means that I have to send some headers and potentially body with each post - the standard R 'filename'=URL logic won't work.

Sometimes people bundle many txts into a zip. These are awkward since I don't know what they contain until I download many of them.

For the moment, I am unpackaging these, gzipping the files (adds the .gz extension) and re-uploading them. They can then be indexed and downloaded.

I'm using Hadley's cute httr package, but I can't see an elegant way to decompress the gz files.

When using read.csv or similar any files with a gz ending are automatically decompressed (convenient!). What's the equivalent when using httr or curl?

content(GET("http://glimmer.rstudio.com/alexbbrown/gz/sample.txt.gz"))
[1] 1f 8b 08 08 4e 9e 9b 51 00 03 73 ...

That looks nice, a compressed byte stream with the correct header (1f 8b). Now I need the text contents, so I tried using memDecompress, which says it should do this:

memDecompress(content(GET("http://glimmer.rstudio.com/alexbbrown/gz/sample.txt.gz")),type="gzip")
Error in memDecompress(content(GET("http://glimmer.rstudio.com/alexbbrown/gz/sample.txt.gz")),  : 
  internal error -3 in memDecompress(2)

What's the proper solution here?

Also, is there a way to get R to pull the INDEX of a remote .zip file without downloading all of it?

like image 539
Alex Brown Avatar asked May 21 '13 16:05

Alex Brown


People also ask

Can R read GZ files?

If you have your files compressed with bzip2, xvz, or gzip they can be read into R as if they are plain text files. You should have the proper filename extensions.

How do I unzip a .GZ file in R?

For (1) you can use `gunzip` at the command line, or gunzip("/home/file. gz") of the R. utils package. For (2), as already mentioned, R does a good job of reading gzip'ed files "as is".

How do I compress a file in R?

To zip files in R, use the zip() function. The zipped file is stored inside the current directory unless a different path is specified in the zip() function argument. The zip() method creates a new ZIP archive, and it overwrites the output file if it exists.

What application opens .GZ files?

The most commonly used programs to open GZ files are WinZip or the native GZIP software for Unix and Linux users.


1 Answers

The following works, but seems a little convoluted:

> scan(gzcon(rawConnection(content(GET("http://glimmer.rstudio.com/alexbbrown/gz/sample.txt.gz")))),"",,,"\n")
Read 1 item
[1] "These are not the droids you are looking for"
like image 72
Alex Brown Avatar answered Sep 22 '22 23:09

Alex Brown