How do i set the source IP/interface with Python and urllib2?
Simple urllib2 scripturlopen('http://python.org/') print "Response:", response # Get the URL. This gets the real URL. print "The URL is: ", response. geturl() # Getting the code print "This gets the code: ", response.
True, if you want to avoid adding any dependencies, urllib is available. But note that even the Python official documentation recommends the requests library: "The Requests package is recommended for a higher-level HTTP client interface."
The urllib module in Python 3 allows you access websites via your program. This opens up as many doors for your programs as the internet opens up for you. urllib in Python 3 is slightly different than urllib2 in Python 2, but they are mostly the same.
Unfortunately the stack of standard library modules in use (urllib2, httplib, socket) is somewhat badly designed for the purpose -- at the key point in the operation, HTTPConnection.connect
(in httplib) delegates to socket.create_connection
, which in turn gives you no "hook" whatsoever between the creation of the socket instance sock
and the sock.connect
call, for you to insert the sock.bind
just before sock.connect
that is what you need to set the source IP (I'm evangelizing widely for NOT designing abstractions in such an airtight, excessively-encapsulated way -- I'll be speaking about that at OSCON this Thursday under the title "Zen and the Art of Abstraction Maintenance" -- but here your problem is how to deal with a stack of abstractions that WERE designed this way, sigh).
When you're facing such problems you only have two not-so-good solutions: either copy, paste and edit the misdesigned code into which you need to place a "hook" that the original designer didn't cater for; or, "monkey-patch" that code. Neither is GOOD, but both can work, so at least let's be thankful that we have such options (by using an open-source and dynamic language). In this case, I think I'd go for monkey-patching (which is bad, but copy and paste coding is even worse) -- a code fragment such as:
import socket true_socket = socket.socket def bound_socket(*a, **k): sock = true_socket(*a, **k) sock.bind((sourceIP, 0)) return sock socket.socket = bound_socket
Depending on your exact needs (do you need all sockets to be bound to the same source IP, or...?) you could simply run this before using urllib2
normally, or (in more complex ways of course) run it at need just for those outgoing sockets you DO need to bind in a certain way (then each time restore socket.socket = true_socket
to get out of the way for future sockets yet to be created). The second alternative adds its own complications to orchestrate properly, so I'm waiting for you to clarify whether you do need such complications before explaining them all.
AKX's good answer is a variant on the "copy / paste / edit" alternative so I don't need to expand much on that -- note however that it doesn't exactly reproduce socket.create_connection
in its connect
method, see the source here (at the very end of the page) and decide what other functionality of the create_connection
function you may want to embody in your copied/pasted/edited version if you decide to go that route.
This seems to work.
import urllib2, httplib, socket class BindableHTTPConnection(httplib.HTTPConnection): def connect(self): """Connect to the host and port specified in __init__.""" self.sock = socket.socket() self.sock.bind((self.source_ip, 0)) if isinstance(self.timeout, float): self.sock.settimeout(self.timeout) self.sock.connect((self.host,self.port)) def BindableHTTPConnectionFactory(source_ip): def _get(host, port=None, strict=None, timeout=0): bhc=BindableHTTPConnection(host, port=port, strict=strict, timeout=timeout) bhc.source_ip=source_ip return bhc return _get class BindableHTTPHandler(urllib2.HTTPHandler): def http_open(self, req): return self.do_open(BindableHTTPConnectionFactory('127.0.0.1'), req) opener = urllib2.build_opener(BindableHTTPHandler) opener.open("http://google.com/").read() # Will fail, 127.0.0.1 can't reach google.com.
You'll need to figure out some way to parameterize "127.0.0.1" there, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With