Recently I have been playing around with the HTTP Proxy in twisted. After much trial and error I think I finally I have something working. What I want to know though, is how, if it is possible, do I expand this proxy to also be able to handle HTTPS pages? Here is what I've got so far:
from twisted.internet import reactor
from twisted.web import http
from twisted.web.proxy import Proxy, ProxyRequest, ProxyClientFactory, ProxyClient
class HTTPProxyClient(ProxyClient):
def handleHeader(self, key, value):
print "%s : %s" % (key, value)
ProxyClient.handleHeader(self, key, value)
def handleResponsePart(self, buffer):
print buffer
ProxyClient.handleResponsePart(self, buffer)
class HTTPProxyFactory(ProxyClientFactory):
protocol = HTTPProxyClient
class HTTPProxyRequest(ProxyRequest):
protocols = {'http' : HTTPProxyFactory}
def process(self):
print self.method
for k,v in self.requestHeaders.getAllRawHeaders():
print "%s : %s" % (k,v)
print "\n \n"
ProxyRequest.process(self)
class HTTPProxy(Proxy):
requestFactory = HTTPProxyRequest
factory = http.HTTPFactory()
factory.protocol = HTTPProxy
reactor.listenSSL(8001, factory)
reactor.run()
As this code demonstrates, for the sake of example for now I am just printing out whatever is going through the connection. Is it possible to handle HTTPS with the same classes? If not, how should I go about implementing such a thing?
I'm not sure about twisted, but I want to warn you that if you implement a HTTPS proxy, a web browser will expect the server's SSL certificate to match the domain name in the URL (address bar). The web browser will issue security warnings otherwise.
There are ways around this, such as generating certificates on the fly, but you'd need the root certificate to be trusted on the browser.
If you want to connect to an HTTPS website via an HTTP proxy, you need to use the CONNECT
HTTP verb (because that's how a proxy works for HTTPS). In this case, the proxy server simply connects to the target server and relays whatever is sent by the server back to the client's socket (and vice versa). There's no caching involved in this case (but you might be able to log the hosts you're connecting to).
The exchange will look like this (client to proxy):
C->P: CONNECT target.host:443 HTTP/1.0
C->P:
P->C: 200 OK
P->C:
After this, the proxy simply opens a plain socket to the target server (no HTTP or SSL/TLS yet) and relays everything between the initial client and the target server (including the TLS handshake that the client initiates). The client upgrades the existing socket it has to the proxy to use TLS/SSL (by starting the SSL/TLS handshake). Once the client has read the '200' status line, as far as the client is concerned, it's as if it had made the connection to the target server directly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With