Question:
How can proxy scrapy
requests with socks5
?
I know I can use
polipo
to convertSocks
Proxy ToHttp
Proxy
But:
I want to set a Middleware or some changes in scrapy.Request
import scrapy
class BaseSpider(scrapy.Spider):
"""a base class that implements major functionality for crawling application"""
start_urls = ('https://google.com')
def start_requests(self):
proxies = {
'http': 'socks5://127.0.0.1:1080',
'https': 'socks5://127.0.0.1:1080'
}
for url in self.start_urls:
yield scrapy.Request(
url=url,
callback=self.parse,
meta={'proxy': proxies} # proxy should be string not dict
)
def parse(self, response):
# do ...
pass
what should I assign to proxies
variable?
It is possible.
Install python-proxy
$ pip3 install pproxy
Run
$ pproxy -l http://:8181 -r socks5://127.0.0.1:9150 -vv
Create middleware (middlewares.py
)
class ProxyMiddleware(object):
def process_request(self, request, spider):
request.meta['proxy'] = "http://127.0.0.1:8181"
Assign it to DOWNLOADER_MIDDLEWARES
(settings.py
)
DOWNLOADER_MIDDLEWARES = {
'PROJECT_NAME_HERE.middlewares.ProxyMiddleware': 350
}
It is currently not possible. There is a feature request for it.
The middleware could be like:-
class ProxyMiddleware(object):
def process_request(self, request):
request.meta['proxy'] = "socks5://127.0.0.1:1080"
Make it available in your settings.py file and see if it works.
Check it out, if it helps https://github.com/gregoriomomm/docker-multsocks...
It provides a way (as multiplatform as Docker is) to connect to a HTTP PROXY standard protocol to get to a SOCKS5 server with advanced route configuration, which is not readly provided by all softwares for free, like in Windows, which you can configure a simples HTTP proxy locally (see configuration at bottom).
It can also be use in this case, for many applications, like some old java implementations that can connect to SOCKS but are not enabled to correctly pass the user and password to authenticate SOCKS connection, so it can act as a SOCKS without authentication chaining to an authenticated.
It is based on common linux commands and can be also reproduced in Windows 10, by using the same commands on a shell with Windows Subsystem Linux (WSL).
In Ubuntu you can just install it
sudo apt install tsocks nmap
# Once you have a tsocks installed and configured
echo "Starting http proxy!!!"
tsocks ncat -l --proxy-type http localhost 3128 &
Example of the /etc/tsocks.conf file (replace the variables with "v" ):
local = 9.0.0.0/255.0.0.0
local = 129.39.186.192/255.255.255.192
path {
reaches = 10.0.0.0/255.0.0.0
reaches = 158.98.181.232/255.255.255.248
reaches = 192.168.0.0/255.255.0.0
server = vSOCKS_HOST
server_port = vSOCKS_PORT
server_type = 5
default_user = vSOCKS_USERNAME
default_pass = vSOCKS_PASSWORD
fallback = yes
}
If you want to try the docker version, just change the path/tsocks.conf to your version it will load a HTTP SOCKS and SOCKS5 unauthenticated route to you SOCKS5 final destination server (and there are other options)
docker run -v path/tsocks.conf:/etc/tsocks.conf -p 3128:3128 -p 1080:1080 gregoriomomm/multsocks:latest
In https://github.com/gregoriomomm/tsocks have a version of tsocks (http://tsocks.sourceforge.net/) to enabled tsocks (Transparent SOCKS5 proxing library) with minor adjustment to work and compile with Alpine:3.11 and included the same fallback option from Ubuntu.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With