In RCurl a function and a class CFILE
is defined to work with C-level file handles. From the manual:
The intent is to be able to pass these to libcurl as options so that it can read or write from or to the file. We can also do this with R connections and specify callback functions that manipulate these connections. But using the C-level FILE handle is likely to be significantly faster for large files.
There are no examples related to downloads so I tried:
library(RCurl)
u = "http://cran.r-project.org/web/packages/RCurl/RCurl.pdf"
f = CFILE("RCurl.pdf", mode="wb")
ret= getURL(u, write = getNativeSymbolInfo("R_curl_write_binary_data")$address,
file = f@ref)
I also tried by replacing the file
option with writedata = f@ref
.
The file is downloaded but it is corrupted.
Writing custom callback for the write
argument works only for non-binary data.
Any idea to download a binary file straight to disk (without loading it in memory) in RCurl?
I think you want to use writedata
and remember to close the file
library(RCurl)
filename <- tempfile()
f <- CFILE(filename, "wb")
url <- "http://cran.fhcrc.org/Rlogo.jpg"
curlPerform(url = url, writedata = f@ref)
close(f)
For more elaborate writing, I'm not sure if this is the best way, but Linux tells me, from
man curl_easy_setopt
that there's a curl option CURL_WRITEFUNCTION that is a pointer to a C function with prototype
size_t function(void *ptr, size_t size, size_t nmemb, void *stream);
and in R at the end of ?curlPerform there's an example of calling a C function as the 'writefunction' option. So I created a file curl_writer.c
#include <stdio.h>
size_t
writer(void *buffer, size_t size, size_t nmemb, void *stream)
{
fprintf(stderr, "<writer> size = %d, nmemb = %d\n",
(int) size, (int) nmemb);
return size * nmemb;
}
Compiled it
R CMD SHLIB curl_writer.c
which on Linux produces a file curl_writer.so, and then in R
dyn.load("curl_writer.so")
writer <- getNativeSymbolInfo("writer", PACKAGE="curl_writer")$address
curlPerform(URL=url, writefunction=writer)
and get on stderr
<writer> size = 1, nmemb = 2653
<writer> size = 1, nmemb = 520
OK
These two ideas can be integrated, i.e., writing to an arbitrary file using an arbitrary function, by modifying the C function to use the FILE * we pass in, as
#include <stdio.h>
size_t
writer(void *buffer, size_t size, size_t nmemb, void *stream)
{
FILE *fout = (FILE *) stream;
fprintf(fout, "<writer> size = %d, nmemb = %d\n",
(int) size, (int) nmemb);
fflush(fout);
return size * nmemb;
}
and then back in R after compiling
dyn.load("curl_writer.so")
writer <- getNativeSymbolInfo("writer", PACKAGE="curl_writer")$address
f <- CFILE(filename <- tempfile(), "wb")
curlPerform(URL=url, writedata=f@ref, writefunction=writer)
close(f)
getURL
can be used here, too, provided writedata=f@ref, write=writer
; I think the problem in the original question is that R_curl_write_binary_data
is really an internal function, writing to a buffer managed by RCurl, rather than a file handle like that created by CFILE
. Likewise, specifying writedata
without write
(which seems from the source code to getURL to be an alias for writefunction) sends a pointer to a file to a function expecting a pointer to something else; for getURL both writedata and write need to be provided.
I am working on this problem as well and don't have an answer, yet.
However, I did find this:
http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTWRITEDATA
Are you working on R under Windows? I am.
This documentation for the writedata function indicates that on windows, you must use writefunction along with writedata.
Reading here: http://www.omegahat.org/RCurl/RCurlJSS.pdf I found that RCurl expects the writefunction to be an R function, so we can implement that ourselves on windows. It is going to be slower than using a C function to write the data, however I bet that the speed of the network link will be the bottleneck.
getURI(url="sftp://hostname/home/me/onegeebee", curl=con, write=function(x) writeChar(x, f, eos=NULL))
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : embedded nul in string: ' <`á\017_\021
(This is after creating a 1GB file on the server to test transfer speed)
I haven't yet found an answer that doesn't choke on NUL bytes in the data. It seems that somewhere in the bowels of the RCurl package when it's passing data up into R to execute the writefunction you supply, it tries to convert the data into a character string. It must not do that if you use a C function. Notably using the recommended R_curl_write_binary_data callback along with CFILE kills rsession.exe on win32 every time for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With