I'm using R in a commercial environment where external connectivity all goes via a web proxy, so we need to specify the proxy server address and ensure we connect to it with Windows authentication.
I already have code that will configure the RCurl and httr packages to use those settings by default - i.e.
httr::set_config(config(
proxy = "my.proxy.address",
proxyuserpwd = ":",
proxyauth = 4
))
or
opts <- list(
proxy = "my.proxy.address",
proxyuserpwd = ":",
proxyauth = 4
)
RCurl::options(RCurlOptions = opts)
However, in a couple of cases recently, I've found packages that depend on the curl package to make web requests - for instance xml2::read_xml
- and I can't find any way to set the same proxy options so they're picked up by default and used by curl.
If I use curl directly myself, I can set the options on a new handle and the following code is sufficient to work successfully:
h = new_handle(proxy = "my.proxy.address",
proxyuserpwd = ":")
con = curl(url,handle = h)
page = xml2::read_xml(con)
... but this isn't any help when the use of curl is buried within someone else's function!
Alternatively, I know I can set up an environment variable for the proxy address, like this:
Sys.setenv(https_proxy = "https://my.proxy.address")
... and libcurl picks it up. But if I do just this, then I end up with an HTTP 407 proxy authentication error. Is there a way to specify blank username / password (as the proxyuserpwd setting does), so we authenticate with Windows credentials? It also doesn't seem possible to specify the proxyauth option as an environment variable.
Can anyone offer a solution or any suggestions, please?
To use a proxy with Curl, you must pass the required proxy address using the -x (or --proxy) command-line option and proxy credentials using the -U (or --proxy-user) command-line switch. Proxy credentials may also be passed in the proxy string and will be URL decoded by Curl.
libcurl respects the proxy environment variables named http_proxy, ftp_proxy, sftp_proxy etc. If set, libcurl will use the specified proxy for that URL scheme. So for a "FTP://" URL, the ftp_proxy is considered. all_proxy is used if no protocol specific proxy was set.
The curl package provides bindings to the libcurl C library for R. The package supports retrieving data in-memory, downloading to disk, or streaming using the R “connection” interface. Some knowledge of curl is recommended to use this package.
I was having similar issues. Here are the steps that worked for me:
In a new R session, test these proxy settings by temporarily setting them with a command similar to the following, substituting your values from your PAC file:
Sys.setenv(http_proxy = "auth-proxy.xxxxxxx.com:9999")
Sys.setenv(https_proxy = "auth-proxy.xxxxxxx.com:9999")
Rerun your code in the same session to see if these new settings solve the issue. This is the test I used.
read_html(curl('http://google.com', handle = curl::new_handle("useragent" = "Mozilla/5.0")))
Setting the proxy using Sys.setenv
will only persist to the end of your current session. To make a more permanent change you may consider adding this to your R_PROFILE
as explained here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With