I was using Mechanize module a while ago, and now try to use Requests module.
(Python mechanize doesn't work when HTTPS and Proxy Authentication required)
I have to go through proxy-server when I access the Internet.
The proxy-server requires authentication. I wrote the following codes.
import requests from requests.auth import HTTPProxyAuth proxies = {"http":"192.168.20.130:8080"} auth = HTTPProxyAuth("username", "password") r = requests.get("http://www.google.co.jp/", proxies=proxies, auth=auth)
The above codes work well when proxy-server requires basic authentication.
Now I want to know what I have to do when proxy-server requires digest authentication.
HTTPProxyAuth seems not to be effective in digest authentication (r.status_code returns 407).
To use a proxy in Python, first import the requests package. Next create a proxies dictionary that defines the HTTP and HTTPS connections. This variable should be a dictionary that maps a protocol to the proxy URL. Additionally, make a url variable set to the webpage you're scraping from.
To achieve this authentication, typically one provides authentication data through Authorization header or a custom header defined by server. Replace “user” and “pass” with your username and password. It will authenticate the request and return a response 200 or else it will return error 403.
Digest authentication is a method of authentication in which a request from a potential user is received by a network server and then sent to a domain controller. The domain controller sends a special key, called a digest session key, to the server that received the original request.
No need to implement your own! in most cases
Requests has built in support for proxies, for basic authentication:
proxies = { 'https' : 'https://user:password@proxyip:port' } r = requests.get('https://url', proxies=proxies)
see more on the docs
Or in case you need digest authentication HTTPDigestAuth
may help.
Or you might need try to extend it like yutaka2487 did bellow.
Note: must use ip of proxy server not its name!
I wrote the class that can be used in proxy authentication (based on digest auth).
I borrowed almost all codes from requests.auth.HTTPDigestAuth.
import requests import requests.auth class HTTPProxyDigestAuth(requests.auth.HTTPDigestAuth): def handle_407(self, r): """Takes the given response and tries digest-auth, if needed.""" num_407_calls = r.request.hooks['response'].count(self.handle_407) s_auth = r.headers.get('Proxy-authenticate', '') if 'digest' in s_auth.lower() and num_407_calls < 2: self.chal = requests.auth.parse_dict_header(s_auth.replace('Digest ', '')) # Consume content and release the original connection # to allow our new request to reuse the same one. r.content r.raw.release_conn() r.request.headers['Authorization'] = self.build_digest_header(r.request.method, r.request.url) r.request.send(anyway=True) _r = r.request.response _r.history.append(r) return _r return r def __call__(self, r): if self.last_nonce: r.headers['Proxy-Authorization'] = self.build_digest_header(r.method, r.url) r.register_hook('response', self.handle_407) return r
Usage:
proxies = { "http" :"192.168.20.130:8080", "https":"192.168.20.130:8080", } auth = HTTPProxyDigestAuth("username", "password") # HTTP r = requests.get("http://www.google.co.jp/", proxies=proxies, auth=auth) r.status_code # 200 OK # HTTPS r = requests.get("https://www.google.co.jp/", proxies=proxies, auth=auth) r.status_code # 200 OK
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With