Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can proxy scrapy requests with Socks5?

Question:

How can proxy scrapy requests with socks5?

I know I can use polipo to convert Socks Proxy To Http Proxy

But:

I want to set a Middleware or some changes in scrapy.Request

import scrapy

class BaseSpider(scrapy.Spider):
    """a base class that implements major functionality for crawling application"""
    start_urls = ('https://google.com')

    def start_requests(self):

        proxies = {
            'http': 'socks5://127.0.0.1:1080',
            'https': 'socks5://127.0.0.1:1080'
        }

        for url in self.start_urls:
            yield scrapy.Request(
                url=url,
                callback=self.parse,
                meta={'proxy': proxies} # proxy should be string not dict
            )

    def parse(self, response):
        # do ...
        pass

what should I assign to proxies variable?

like image 618
Phoenix Avatar asked Nov 28 '19 09:11

Phoenix


4 Answers

It is possible.

HTTP Proxy to Socks5

Install python-proxy

$ pip3 install pproxy

Run

$ pproxy -l http://:8181 -r socks5://127.0.0.1:9150 -vv

Scrapy with HTTP Proxy

Create middleware (middlewares.py)

class ProxyMiddleware(object):
    def process_request(self, request, spider):
        request.meta['proxy'] = "http://127.0.0.1:8181"

Assign it to DOWNLOADER_MIDDLEWARES (settings.py)

DOWNLOADER_MIDDLEWARES = {
    'PROJECT_NAME_HERE.middlewares.ProxyMiddleware': 350
}
like image 78
Almog Avatar answered Oct 20 '22 13:10

Almog


It is currently not possible. There is a feature request for it.

like image 30
Gallaecio Avatar answered Oct 20 '22 13:10

Gallaecio


The middleware could be like:-

class ProxyMiddleware(object):
    def process_request(self, request):
        request.meta['proxy'] = "socks5://127.0.0.1:1080"

Make it available in your settings.py file and see if it works.

like image 32
Prithvi Singh Avatar answered Oct 20 '22 14:10

Prithvi Singh


Check it out, if it helps https://github.com/gregoriomomm/docker-multsocks...

It provides a way (as multiplatform as Docker is) to connect to a HTTP PROXY standard protocol to get to a SOCKS5 server with advanced route configuration, which is not readly provided by all softwares for free, like in Windows, which you can configure a simples HTTP proxy locally (see configuration at bottom).

It can also be use in this case, for many applications, like some old java implementations that can connect to SOCKS but are not enabled to correctly pass the user and password to authenticate SOCKS connection, so it can act as a SOCKS without authentication chaining to an authenticated.

It is based on common linux commands and can be also reproduced in Windows 10, by using the same commands on a shell with Windows Subsystem Linux (WSL).

In Ubuntu you can just install it

sudo apt install tsocks nmap

# Once you have a tsocks installed and configured 
echo "Starting http proxy!!!"
tsocks ncat -l --proxy-type http localhost 3128 & 

Example of the /etc/tsocks.conf file (replace the variables with "v" ):

local = 9.0.0.0/255.0.0.0
local = 129.39.186.192/255.255.255.192

path {
reaches = 10.0.0.0/255.0.0.0
reaches = 158.98.181.232/255.255.255.248
reaches = 192.168.0.0/255.255.0.0
server = vSOCKS_HOST
server_port = vSOCKS_PORT
server_type = 5
default_user = vSOCKS_USERNAME
default_pass = vSOCKS_PASSWORD
fallback = yes
}

If you want to try the docker version, just change the path/tsocks.conf to your version it will load a HTTP SOCKS and SOCKS5 unauthenticated route to you SOCKS5 final destination server (and there are other options)

docker run -v path/tsocks.conf:/etc/tsocks.conf -p 3128:3128 -p 1080:1080  gregoriomomm/multsocks:latest 

In https://github.com/gregoriomomm/tsocks have a version of tsocks (http://tsocks.sourceforge.net/) to enabled tsocks (Transparent SOCKS5 proxing library) with minor adjustment to work and compile with Alpine:3.11 and included the same fallback option from Ubuntu.

like image 26
Gregorio Momm Avatar answered Oct 20 '22 13:10

Gregorio Momm