How to make urllib2 requests through Tor in Python?

People also ask

Does Python requests use Urllib?

In Python, cURL transfers requests and data to and from servers using PycURL. PycURL functions as an interface for the libcURL library within Python. Almost every programming language can use REST APIs to access an endpoint hosted on a web server.

Is Urllib request the same as request?

Despite the similar name, they are unrelated: they have a different design and a different implementation. urllib was the original Python HTTP client, added to the standard library in Python 1.2. Earlier documentation for urllib can be found in Python 1.4.

You are trying to connect to a SOCKS port - Tor rejects any non-SOCKS traffic. You can connect through a middleman - Privoxy - using Port 8118.

Example:

proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"})
opener = urllib2.build_opener(proxy_support) 
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
print opener.open('http://www.google.com').read()

Also please note properties passed to ProxyHandler, no http prefixing the ip:port

pip install PySocks

Then:

import socket
import socks
import urllib2

ipcheck_url = 'http://checkip.amazonaws.com/'

# Actual IP.
print(urllib2.urlopen(ipcheck_url).read())

# Tor IP.
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9050)
socket.socket = socks.socksocket
print(urllib2.urlopen(ipcheck_url).read())

Using just urllib2.ProxyHandler as in https://stackoverflow.com/a/2015649/895245 fails with:

Tor is not an HTTP Proxy

Mentioned at: How can I use a SOCKS 4/5 proxy with urllib2?

Tested on Ubuntu 15.10, Tor 0.2.6.10, Python 2.7.10.

The following code is 100% working on Python 3.4

(you need to keep TOR Browser open wil using this code)

This script connects to TOR through socks5 get the IP from checkip.dyn.com, change identity and resend the request to get a the new IP (loops 10 times)

You need to install the appropriate libraries to get this working. (Enjoy and don't abuse)

import socks
import socket
import time
from stem.control import Controller
from stem import Signal
import requests
from bs4 import BeautifulSoup
err = 0
counter = 0
url = "checkip.dyn.com"
with Controller.from_port(port = 9151) as controller:
    try:
        controller.authenticate()
        socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9150)
        socket.socket = socks.socksocket
        while counter < 10:
            r = requests.get("http://checkip.dyn.com")
            soup = BeautifulSoup(r.content)
            print(soup.find("body").text)
            counter = counter + 1
            #wait till next identity will be available
            controller.signal(Signal.NEWNYM)
            time.sleep(controller.get_newnym_wait())
    except requests.HTTPError:
        print("Could not reach URL")
        err = err + 1
print("Used " + str(counter) + " IPs and got " + str(err) + " errors")

Using privoxy as http-proxy in front of tor works for me - here's a crawler-template:


import urllib2
import httplib

from BeautifulSoup import BeautifulSoup
from time import sleep

class Scraper(object):
    def __init__(self, options, args):
        if options.proxy is None:
            options.proxy = "http://localhost:8118/"
        self._open = self._get_opener(options.proxy)

    def _get_opener(self, proxy):
        proxy_handler = urllib2.ProxyHandler({'http': proxy})
        opener = urllib2.build_opener(proxy_handler)
        return opener.open

    def get_soup(self, url):
        soup = None
        while soup is None:
            try:
                request = urllib2.Request(url)
                request.add_header('User-Agent', 'foo bar useragent')
                soup = BeautifulSoup(self._open(request))
            except (httplib.IncompleteRead, httplib.BadStatusLine,
                    urllib2.HTTPError, ValueError, urllib2.URLError), err:
                sleep(1)
        return soup

class PageType(Scraper):
    _URL_TEMPL = "http://foobar.com/baz/%s"

    def items_from_page(self, url):
        nextpage = None
        soup = self.get_soup(url)

        items = []
        for item in soup.findAll("foo"):
            items.append(item["bar"])
            nexpage = item["href"]

        return nextpage, items

    def get_items(self):
        nextpage, items = self._categories_from_page(self._START_URL % "start.html")
        while nextpage is not None:
            nextpage, newitems = self.items_from_page(self._URL_TEMPL % nextpage)
            items.extend(newitems)
        return items()

pt = PageType()
print pt.get_items()

Here is a code for downloading files using tor proxy in python: (update url)

import urllib2

url = "http://www.disneypicture.net/data/media/17/Donald_Duck2.gif"

proxy = urllib2.ProxyHandler({'http': '127.0.0.1:8118'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)

file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)

file_size_dl = 0
block_sz = 8192
while True:
    buffer = u.read(block_sz)
    if not buffer:
        break

    file_size_dl += len(buffer)
    f.write(buffer)
    status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
    status = status + chr(8)*(len(status)+1)
    print status,

f.close()

The following solution works for me in Python 3. Adapted from CiroSantilli's answer:

With urllib (name of urllib2 in Python 3):

import socks
import socket
from urllib.request import urlopen

url = 'http://icanhazip.com/'

socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9150)
socket.socket = socks.socksocket

response = urlopen(url)
print(response.read())

With requests:

import socks
import socket
import requests

url = 'http://icanhazip.com/'

socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9150)
socket.socket = socks.socksocket

response = requests.get(url)
print(response.text)

With Selenium + PhantomJS:

from selenium import webdriver

url = 'http://icanhazip.com/'

service_args = [ '--proxy=localhost:9150', '--proxy-type=socks5', ]
phantomjs_path = '/your/path/to/phantomjs'

driver = webdriver.PhantomJS(
    executable_path=phantomjs_path, 
    service_args=service_args)

driver.get(url)
print(driver.page_source)
driver.close()

Note: If you are planning to use Tor often, consider making a donation to support their awesome work!

Related questions
                            
                                How to write unicode strings into a file? [duplicate]
                            
                                Display fullscreen mode on Tkinter
                            
                                Inserting an item in a Tuple [duplicate]
                            
                                Python (pip) - RequestsDependencyWarning: urllib3 (1.9.1) or chardet (2.3.0) doesn't match a supported version
                            
                                xls to csv converter
                            
                                How to limit a number to be within a specified range? (Python)
                            
                                Clear all widgets in a layout in pyqt
                            
                                Generate unique id in django from a model field
                            
                                ImportError: No module named mysql.connector using Python2
                            
                                Enforcing python version in setup.py
                            
                                Efficient calculation of Fibonacci series
                            
                                Nose unable to find tests in ubuntu
                            
                                Running a test suite with over a million test cases
                            
                                Error packaging Kivy with numpy library for Android using buildozer
                            
                                gensim Doc2Vec vs tensorflow Doc2Vec
                            
                                Best practices for turning jupyter notebooks into python scripts
                            
                                Why is string's startswith slower than in?
                            
                                When does socket.recv(recv_size) return?
                            
                                ResourceWarning unclosed socket in Python 3 Unit Test
                            
                                How to know if urllib.urlretrieve succeeds?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to make urllib2 requests through Tor in Python?

Tags:

python

tor

People also ask

Recent Activity

Donate For Us