Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I open a website with urllib via proxy in Python?

Tags:

python

proxy

I have this program that check a website, and I want to know how can I check it via proxy in Python...

this is the code, just for example

while True:     try:         h = urllib.urlopen(website)         break     except:         print '['+time.strftime('%Y/%m/%d %H:%M:%S')+'] '+'ERROR. Trying again in a few seconds...'         time.sleep(5) 
like image 821
Bruno 'Shady' Avatar asked Jul 02 '10 18:07

Bruno 'Shady'


People also ask

How do I open a proxy site in Python?

To use a proxy in Python, first import the requests package. Next create a proxies dictionary that defines the HTTP and HTTPS connections. This variable should be a dictionary that maps a protocol to the proxy URL. Additionally, make a url variable set to the webpage you're scraping from.


2 Answers

By default, urlopen uses the environment variable http_proxy to determine which HTTP proxy to use:

$ export http_proxy='http://myproxy.example.com:1234' $ python myscript.py  # Using http://myproxy.example.com:1234 as a proxy 

If you instead want to specify a proxy inside your application, you can give a proxies argument to urlopen:

proxies = {'http': 'http://myproxy.example.com:1234'} print("Using HTTP proxy %s" % proxies['http']) urllib.urlopen("http://www.google.com", proxies=proxies) 

Edit: If I understand your comments correctly, you want to try several proxies and print each proxy as you try it. How about something like this?

candidate_proxies = ['http://proxy1.example.com:1234',                      'http://proxy2.example.com:1234',                      'http://proxy3.example.com:1234'] for proxy in candidate_proxies:     print("Trying HTTP proxy %s" % proxy)     try:         result = urllib.urlopen("http://www.google.com", proxies={'http': proxy})         print("Got URL using proxy %s" % proxy)         break     except:         print("Trying next proxy in 5 seconds")         time.sleep(5) 
like image 56
Pär Wieslander Avatar answered Sep 21 '22 13:09

Pär Wieslander


Python 3 is slightly different here. It will try to auto detect proxy settings but if you need specific or manual proxy settings, think about this kind of code:

#!/usr/bin/env python3 import urllib.request  proxy_support = urllib.request.ProxyHandler({'http' : 'http://user:pass@server:port',                                               'https': 'https://...'}) opener = urllib.request.build_opener(proxy_support) urllib.request.install_opener(opener)  with urllib.request.urlopen(url) as response:     # ... implement things such as 'html = response.read()' 

Refer also to the relevant section in the Python 3 docs

like image 33
DomTomCat Avatar answered Sep 23 '22 13:09

DomTomCat