Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a proxy pool server (when a request comes, choose a proxy to get url content) in python?

I do not know what the proper name is for such proxy server, you're welcome to fix my question title.

When I search proxy server on google, a lot implements like maproxy or a-python-proxy-in-less-than-100-lines-of-code. Those proxies server seems just ask remote server to get a certain url address.

I want to build a proxy server, which contains a proxy pool(a list of http/https proxies) and only have one IP address and one port to serve incoming requests. When a request comes, it would choose a proxy from the pool and do this request, and return result back.

For example I have a VPS which IP '192.168.1.66'. I start proxy server at this VPS with IP '127.0.0.1' and port '8080'.

I can then use this proxy like below.

import requests
url = 'http://www.google.com'
headers = {
    ...
}
proxies = {
    'http': 'http://192.168.1.66:8080'
}

r = requests.get(url, headers=headers, proxies=proxies)

I have see some impelement like:

from twisted.web import proxy, http
from twisted.internet import reactor
from twisted.python import log
import sys
log.startLogging(sys.stdout)

class ProxyFactory(http.HTTPFactory):
    protocol = proxy.Proxy

reactor.listenTCP(8080, ProxyFactory())
reactor.run()

It works, but it is so simple that I have no idea how it works and how to improve this code to use a proxy pool.

An example flow :

from hidu/proxy-manager , which write by golang .

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
+ client (want visit http://www.baidu.com/)              +  
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
                        |  
                        |  via proxy 127.0.0.1:8090  
                        |  
                        V  
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
+                       +         proxy pool             +  
+ proxy manager listen  ++++++++++++++++++++++++++++++++++  
+ on (127.0.0.1:8090)   +  http_proxy1,http_proxy2,      +  
+                       +  socks5_proxy1,socks5_proxy2   +  
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
                        |  
                        |  choose one proxy visit 
                        |  www.baidu.com  
                        |  
                        V  
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
+        site:www.baidu.com                              +  
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
like image 854
Mithril Avatar asked Oct 15 '15 04:10

Mithril


People also ask

How do I create a proxy server in Python?

To use a proxy in Python, first import the requests package. Next create a proxies dictionary that defines the HTTP and HTTPS connections. This variable should be a dictionary that maps a protocol to the proxy URL. Additionally, make a url variable set to the webpage you're scraping from.

How do I create a proxy URL?

var httpProxy = require('http-proxy') var proxy = httpProxy. createProxy(); var options = { 'foo.com': 'http://website.com:8001', 'bar.com': 'http://website2.com:8002' } require('http'). createServer(function(req, res) { proxy. web(req, res, { target: options[req.


1 Answers

Your Proxy Pool concept is not hard to implement. If I understand correctly, you want to make following.

  1. YOUR PROXY SERVER listening requests on 192.168.1.66:8080
  2. CLIENT requests to access http://www.google.com
  3. YOUR PROXY SERVER sends CLIENT's request to ANOTHER PROXY SERVER, which is in list of ANOTHER PROXY SERVER - PROXY POOL.
  4. YOUR PROXY SERVER gets response from ANOTHER PROXY SERVER, and respond to CLIENT

So, I've write simple proxy server using Flask and Requests.

from flask import Flask, Response
import random

app = Flask(__name__)

@app.route('/p/<path:url>')
def proxy(url):
    """ Request to this like /p/www.google.com
    """
    url = 'http://{}'.format(url)
    r = get_response(url)

    return Response(stream_with_context(r.iter_content()), 
                    content_type=r.headers['content-type'])

def get_proxy():
    # This is your "Proxy Pool"
    proxies = [
        'http://proxy-server-1.com',
        'http://proxy-server-2.com',
        'http://proxy-server-3.com',
    ]

    return random.choice(proxies)

def get_response(target_url):
    proxy = get_proxy();
    url = "{}/p/{}".format(proxy, target_url)
    # Above line will generate like http://proxy-server-1.com/p/www.google.com

    return requests.get(url, stream=True)

if __name__ == '__main__':
    app.run()

Then, you can start here to improve your proxy server.

Common Proxy Pool, or Proxy Manager can check availability, speed, and more stats of it's proxies, and select best proxy to send request. And of course, this example handle only simple request, and you can add features handle request args, methods, protocols.

Hope this helpful!

like image 57
changhwan Avatar answered Oct 22 '22 00:10

changhwan