I am running a script using python that uses urllib2 to grab data from a weather api and display it on screen. I have had the problem that when I query the server, I get a "no address associated with hostname" error. I can view the output of the api with a web browser, and I can download the file with wget, but I have to force IPv4 to get it to work. Is it possible to force IPv4 in urllib2 when using urllib2.urlopen?
urllib2 is deprecated in python 3. x. use urllib instaed.
Despite the similar name, they are unrelated: they have a different design and a different implementation. urllib was the original Python HTTP client, added to the standard library in Python 1.2.
NOTE: urllib2 is no longer available in Python 3 You can get more idea about urllib.
Urllib package is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols.
Not directly, no.
So, what can you do?
One possibility is to explicitly resolve the hostname to IPv4 yourself, and then use the IPv4 address instead of the name as the host. For example:
host = socket.gethostbyname('example.com')
page = urllib2.urlopen('http://{}/path'.format(host))
However, some virtual-server sites may require a Host: example.com
header, and they will instead get a Host: 93.184.216.119
. You can work around that by overriding the header:
host = socket.gethostbyname('example.com')
request = urllib2.Request('http://{}/path'.format(host),
headers = {'Host': 'example.com'})
page = urllib2.urlopen(request)
Alternatively, you can provide your own handlers in place of the standard ones. But the standard handler is mostly just a wrapper around httplib.HTTPConnection
, and the real problem is in HTTPConnection.connect
.
So, the clean way to do this is to create your own subclass of httplib.HTTPConnection
, which overrides connect
like this:
def connect(self):
host = socket.gethostbyname(self.host)
self.sock = socket.create_connection((host, self.post),
self.timeout, self.source_address)
if self._tunnel_host:
self._tunnel()
Then create your own subclass of urllib2.HTTPHandler
that overrides http_open
to use your subclass:
def http_open(self, req):
return self.do_open(my wrapper.MyHTTPConnection, req)
… and similarly for HTTPSHandler
, and then hook up all the stuff properly as shown in the urllib2
docs.
The quick & dirty way to do the same thing is to just monkeypatch httplib.HTTPConnection.connect
to the above function.
Finally, you could use a different library instead of urllib2
. From what I remember, requests
doesn't make this any easier (ultimately, you have to override or monkeypatch slightly different methods, but it's effectively the same). However, any libcurl
wrapper will allow you to do the equivalent of curl_easy_setopt(h, CURLOPT_IPRESOLVE, CURLOPT_IPRESOLVE_V4)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With