Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to configure the curl package in R with default web proxy settings?

I'm using R in a commercial environment where external connectivity all goes via a web proxy, so we need to specify the proxy server address and ensure we connect to it with Windows authentication.

I already have code that will configure the RCurl and httr packages to use those settings by default - i.e.

httr::set_config(config(
  proxy = "my.proxy.address", 
  proxyuserpwd = ":", 
  proxyauth = 4
   ))

or

opts <- list(
  proxy = "my.proxy.address",
  proxyuserpwd = ":", 
  proxyauth = 4
)
RCurl::options(RCurlOptions = opts)

However, in a couple of cases recently, I've found packages that depend on the curl package to make web requests - for instance xml2::read_xml - and I can't find any way to set the same proxy options so they're picked up by default and used by curl.

If I use curl directly myself, I can set the options on a new handle and the following code is sufficient to work successfully:

  h = new_handle(proxy = "my.proxy.address",
                 proxyuserpwd = ":")
  con = curl(url,handle = h)
  page = xml2::read_xml(con)

... but this isn't any help when the use of curl is buried within someone else's function!

Alternatively, I know I can set up an environment variable for the proxy address, like this:

Sys.setenv(https_proxy = "https://my.proxy.address")

... and libcurl picks it up. But if I do just this, then I end up with an HTTP 407 proxy authentication error. Is there a way to specify blank username / password (as the proxyuserpwd setting does), so we authenticate with Windows credentials? It also doesn't seem possible to specify the proxyauth option as an environment variable.

Can anyone offer a solution or any suggestions, please?

like image 733
djb72 Avatar asked Oct 26 '18 15:10

djb72


People also ask

How do I use curl with a proxy?

To use a proxy with Curl, you must pass the required proxy address using the -x (or --proxy) command-line option and proxy credentials using the -U (or --proxy-user) command-line switch. Proxy credentials may also be passed in the proxy string and will be URL decoded by Curl.

Does curl respect Http_proxy?

libcurl respects the proxy environment variables named http_proxy, ftp_proxy, sftp_proxy etc. If set, libcurl will use the specified proxy for that URL scheme. So for a "FTP://" URL, the ftp_proxy is considered. all_proxy is used if no protocol specific proxy was set.

What is the curl package?

The curl package provides bindings to the libcurl C library for R. The package supports retrieving data in-memory, downloading to disk, or streaming using the R “connection” interface. Some knowledge of curl is recommended to use this package.


1 Answers

I was having similar issues. Here are the steps that worked for me:

  1. Download my company's proxy auto-config file (PAC file). For IE: click the gear icon --> internet options --> Connections --> LAN Settings --> copy the http address into a new browser window to download the text file.
  2. Locate the line in the PAC file specifying the proxy (eg: "auth-proxy.xxxxxxx.com:9999")
  3. In a new R session, test these proxy settings by temporarily setting them with a command similar to the following, substituting your values from your PAC file:

    Sys.setenv(http_proxy = "auth-proxy.xxxxxxx.com:9999")
    Sys.setenv(https_proxy = "auth-proxy.xxxxxxx.com:9999")
    
  4. Rerun your code in the same session to see if these new settings solve the issue. This is the test I used.

    read_html(curl('http://google.com', handle = curl::new_handle("useragent" = "Mozilla/5.0")))
    

Setting the proxy using Sys.setenv will only persist to the end of your current session. To make a more permanent change you may consider adding this to your R_PROFILE as explained here.

like image 153
Stan Avatar answered Oct 19 '22 15:10

Stan