I familiar with the fact that I should set the HTTP_RPOXY environment variable to the proxy address.
Generally urllib works fine, the problem is dealing with urllib2.
>>> urllib2.urlopen("http://www.google.com").read()
returns
urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>
or
urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>
I tried @Fenikso answer but I'm getting this error now:
URLError: <urlopen error [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>
Any ideas?
To use a proxy in Python, first import the requests package. Next create a proxies dictionary that defines the HTTP and HTTPS connections. This variable should be a dictionary that maps a protocol to the proxy URL. Additionally, make a url variable set to the webpage you're scraping from.
In order to use proxies in the requests Python library, you need to create a dictionary that defines the HTTP, HTTPS, and FTP connections. This allows each connection to map to an individual URL and port. This process is the same for any request being made, including GET requests and POST requests.
Explanation. Here we will be setting an attribute _obj inside Proxy class which will be either an instance of ClassA or ClassB. Then we have provided overrides for built-in functions inside an object, which will get attributes from _obj instead of the Proxy object itself.
You can do it even without the HTTP_PROXY environment variable. Try this sample:
import urllib2 proxy_support = urllib2.ProxyHandler({"http":"http://61.233.25.166:80"}) opener = urllib2.build_opener(proxy_support) urllib2.install_opener(opener) html = urllib2.urlopen("http://www.google.com").read() print html
In your case it really seems that the proxy server is refusing the connection.
Something more to try:
import urllib2 #proxy = "61.233.25.166:80" proxy = "YOUR_PROXY_GOES_HERE" proxies = {"http":"http://%s" % proxy} url = "http://www.google.com/search?q=test" headers={'User-agent' : 'Mozilla/5.0'} proxy_support = urllib2.ProxyHandler(proxies) opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1)) urllib2.install_opener(opener) req = urllib2.Request(url, None, headers) html = urllib2.urlopen(req).read() print html
Edit 2014: This seems to be a popular question / answer. However today I would use third party requests
module instead.
For one request just do:
import requests r = requests.get("http://www.google.com", proxies={"http": "http://61.233.25.166:80"}) print(r.text)
For multiple requests use Session
object so you do not have to add proxies
parameter in all your requests:
import requests s = requests.Session() s.proxies = {"http": "http://61.233.25.166:80"} r = s.get("http://www.google.com") print(r.text)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With