I want to crawl a site, however cloudflare was getting in the way. I was able to get the servers IP, so cloudflare won't bother me.
How can I utilize this in the requests library?
For example, I want to go directly to
www.example.com/foo.php
, but in requests it will resolve the IP on the cloudflare network instead of the one I want it to use. How can I make it use the one I want it to use?
I would of sent in a request so the real IP with the host set as the www.example.com, but that will just give me the home page. How can I visit other links on the site?
You will have to set a custom header host
with value of example.com
, something like:
requests.get('http://127.0.0.1/foo.php', headers={'host': 'example.com'})
should do the trick. If you want to verify that then type in the following command (requires netcat): nc -l -p 80
and then run the above command. It will produce output in the netcat window:
GET /foo.php HTTP/1.1
Host: example.com
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.6.2 CPython/3.4.3 Windows/8
You'd have to tell requests
to fake the Host
header, and replace the hostname in the URL with the IP address:
requests.get('http://123.45.67.89/foo.php', headers={'Host': 'www.example.com'})
The URL 'patching' can be done with the urlparse
library:
parsed = urlparse.urlparse(url)
hostname = parsed.hostname
parsed = parsed._replace(netloc=ipaddress)
ip_url = parsed.geturl()
response = requests.get(ip_url, headers={'Host': hostname})
Demo against Stack Overflow:
>>> import urlparse
>>> import socket
>>> url = 'http://stackoverflow.com/help/privileges'
>>> parsed = urlparse.urlparse(url)
>>> hostname = parsed.hostname
>>> hostname
'stackoverflow.com'
>>> ipaddress = socket.gethostbyname(hostname)
>>> ipaddress
'198.252.206.16'
>>> parsed = parsed._replace(netloc=ipaddress)
>>> ip_url = parsed.geturl()
>>> ip_url
'http://198.252.206.16/help/privileges'
>>> response = requests.get(ip_url, headers={'Host': hostname})
>>> response
<Response [200]>
In this case I looked up the ip address dynamically.
I think the best way to send https requests to a specific IP is to add a customized resolver to bind domain name to that IP you want to hit. In this way, both SNI and host header are correctly set, and certificate verification can always succeed as web browser.
Otherwise, you will see various issue like InsecureRequestWarning
, SSLCertVerificationError
, and SNI is always missing in Client Hello
, even if you try different combination of headers and verify arguments.
requests.get('https://1.2.3.4/foo.php', headers= {"host": "example.com", verify=True)
In addition, I tried
requests_toolbelt
pip install requests[security]
forcediphttpsadapter
all solutions mentioned here using requests with TLS doesn't give SNI support
None of them set SNI when hitting https://IP directly.
# mock /etc/hosts
# lock it in multithreading or use multiprocessing if an endpoint is bound to multiple IPs frequently
etc_hosts = {}
# decorate python built-in resolver
def custom_resolver(builtin_resolver):
def wrapper(*args, **kwargs):
try:
return etc_hosts[args[:2]]
except KeyError:
# fall back to builtin_resolver for endpoints not in etc_hosts
return builtin_resolver(*args, **kwargs)
return wrapper
# monkey patching
socket.getaddrinfo = custom_resolver(socket.getaddrinfo)
def _bind_ip(domain_name, port, ip):
'''
resolve (domain_name,port) to a given ip
'''
key = (domain_name, port)
# (family, type, proto, canonname, sockaddr)
value = (socket.AddressFamily.AF_INET, socket.SocketKind.SOCK_STREAM, 6, '', (ip, port))
etc_hosts[key] = [value]
_bind_ip('example.com', 443, '1.2.3.4')
# this sends requests to 1.2.3.4
response = requests.get('https://www.example.com/foo.php', verify=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With