Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Tor socks5 in R getURL

Tags:

curl

r

proxy

tor

socks

I want to use Tor in getURL function in R. Tor is working (checked in firefox), socks5 at port 9050. But when I set this in R, I get the following error

html <- getURL("http://www.google.com", followlocation = T, .encoding="UTF-8", .opts = list(proxy = "127.0.0.1:9050", timeout=15))

Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : '\n\nTor is not an HTTP Proxy\n\n\n

Tor is not an HTTP Proxy

\n

\nIt appears you have configured your web browser to use Tor as an HTTP proxy.\nThis is not correct: Tor is a SOCKS proxy, not an HTTP proxy.\nPlease configure your client accordingly.

I've tried replace proxy with socks, socks5 but it didn't work.

like image 366
bartektartanus Avatar asked Jul 29 '13 13:07

bartektartanus


3 Answers

There are curl bindings for R, after which you can use curl to call the Tor SOCKS5 proxy server.

The call from the shell (which you can translate to the R binding) is:

curl --socks5-hostname 127.0.0.1:9050 google.com

Tor will do the DNS also for A records.

like image 99
zkilnbqi Avatar answered Oct 31 '22 15:10

zkilnbqi


RCurl will default to a HTTP proxy, but Tor provides a SOCKS proxy. Tor is clever enough to understand that the proxy client (RCurl) is trying to use a HTTP proxy, hence the error message in HTML returned by Tor.

In order to get RCurl, and curl, to use a SOCKS proxy, you can use a protocol prefix, and there are two protocol prefixes for SOCKS5: "socks5" and "socks5h" (see the Curl manual). The latter will let the SOCKS server handle DNS-queries, which is the preferred method when using Tor (in fact, Tor will warn you if you let the proxy client resolve the hostname).

Here is a pure R solution which will use Tor for dns-queries.

library(RCurl)
options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050"))
my.handle <- getCurlHandle()
html <- getURL(url='https://www.torproject.org', curl=my.handle)

If you want to specify additional parameters, see below on where to put them:

library(RCurl)
options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050",
                            useragent = "Mozilla",
                            followlocation = TRUE,
                            referer = "",
                            cookiejar = "my.cookies.txt"
                            )
        )
my.handle <- getCurlHandle()
html <- getURL(url='https://www.torproject.org', curl=my.handle)
like image 36
Hans Ekbrand Avatar answered Oct 31 '22 13:10

Hans Ekbrand


Hi Naparst I would really appreciate a hint on how to do the solution you propose option should be something like : opts <- list(socks5.hostname="127.0.0.1:9050") (this doesn't work since socks5.hostname is not an option)

like image 26
Cossutta Avatar answered Oct 31 '22 13:10

Cossutta