I am building clients onto RESTful APIs. Some links let me download attachments (files) from the server, and in the best case these are .txt. I only mention the RESTful part since it means that I have to send some headers and potentially body with each post - the standard R 'filename'=URL logic won't work.
Sometimes people bundle many txts into a zip. These are awkward since I don't know what they contain until I download many of them.
For the moment, I am unpackaging these, gzipping the files (adds the .gz extension) and re-uploading them. They can then be indexed and downloaded.
I'm using Hadley's cute httr package, but I can't see an elegant way to decompress the gz files.
When using read.csv or similar any files with a gz ending are automatically decompressed (convenient!). What's the equivalent when using httr or curl?
content(GET("http://glimmer.rstudio.com/alexbbrown/gz/sample.txt.gz"))
[1] 1f 8b 08 08 4e 9e 9b 51 00 03 73 ...
That looks nice, a compressed byte stream with the correct header (1f 8b). Now I need the text contents, so I tried using memDecompress, which says it should do this:
memDecompress(content(GET("http://glimmer.rstudio.com/alexbbrown/gz/sample.txt.gz")),type="gzip")
Error in memDecompress(content(GET("http://glimmer.rstudio.com/alexbbrown/gz/sample.txt.gz")), :
internal error -3 in memDecompress(2)
What's the proper solution here?
Also, is there a way to get R to pull the INDEX of a remote .zip file without downloading all of it?
If you have your files compressed with bzip2, xvz, or gzip they can be read into R as if they are plain text files. You should have the proper filename extensions.
For (1) you can use `gunzip` at the command line, or gunzip("/home/file. gz") of the R. utils package. For (2), as already mentioned, R does a good job of reading gzip'ed files "as is".
To zip files in R, use the zip() function. The zipped file is stored inside the current directory unless a different path is specified in the zip() function argument. The zip() method creates a new ZIP archive, and it overwrites the output file if it exists.
The most commonly used programs to open GZ files are WinZip or the native GZIP software for Unix and Linux users.
The following works, but seems a little convoluted:
> scan(gzcon(rawConnection(content(GET("http://glimmer.rstudio.com/alexbbrown/gz/sample.txt.gz")))),"",,,"\n")
Read 1 item
[1] "These are not the droids you are looking for"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With