Trying to use Python 3 urlopen
on many HTTPS sites on recent (>=Vista) Windows machines I get "SSL: CERTIFICATE_VERIFY_FAILED" errors when trying to do an urllib.request.urlopen
on many sites (on some build machines even https://www.google.com/
, but curiously never on https://www.microsoft.com/
).
>>> import urllib.request
>>> urllib.request.urlopen("https://www.google.com/")
Traceback (most recent call last):
File "C:\Python35\lib\urllib\request.py", line 1254, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "C:\Python35\lib\http\client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "C:\Python35\lib\http\client.py", line 1151, in _send_request
self.endheaders(body)
File "C:\Python35\lib\http\client.py", line 1102, in endheaders
self._send_output(message_body)
File "C:\Python35\lib\http\client.py", line 934, in _send_output
self.send(msg)
File "C:\Python35\lib\http\client.py", line 877, in send
self.connect()
File "C:\Python35\lib\http\client.py", line 1260, in connect
server_hostname=server_hostname)
File "C:\Python35\lib\ssl.py", line 377, in wrap_socket
_context=self)
File "C:\Python35\lib\ssl.py", line 752, in __init__
self.do_handshake()
File "C:\Python35\lib\ssl.py", line 988, in do_handshake
self._sslobj.do_handshake()
File "C:\Python35\lib\ssl.py", line 633, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c
:645)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python35\lib\urllib\request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "C:\Python35\lib\urllib\request.py", line 466, in open
response = self._open(req, data)
File "C:\Python35\lib\urllib\request.py", line 484, in _open
'_open', req)
File "C:\Python35\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "C:\Python35\lib\urllib\request.py", line 1297, in https_open
context=self._context, check_hostname=self._check_hostname)
File "C:\Python35\lib\urllib\request.py", line 1256, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certifica
te verify failed (_ssl.c:645)>
Most infuriatingly, this happens almost only on the build/CI servers, and often these errors disappear after trying to investigate the issue (e.g. checking the connectivity to the given site, which responds correctly when tried through a browser):
>>> import urllib.request
>>> urllib.request.urlopen("https://www.google.com/")
<http.client.HTTPResponse object at 0x0000000002D930B8>
I heard many suggestions about disabling the certificate validation by messing with SSL contexts, but I'd like to avoid this - I want to keep my HTTPS security intact!
What could be the cause of this issue? How can I fix it?
Unfortunately, it's a sad story still without a happy ending, and is detailed in https://bugs.python.org/issue20916.
Python 3.3 added the cadefault
parameter to urllib.request.urlopen
, defaulting to True
(https://bugs.python.org/issue14780), which made HTTPS requests validate the server certificates using the system certificates store by default.
Python 3.4 made SSLContext.set_default_verify_paths
kind-of-work on Windows (https://bugs.python.org/issue19292), enabling Python to use the Windows certificate store.
Previously, Microsoft pushed root certificates updates through Windows Update, which ensured that the system root certificates store was always updated (as long as the user installed the updates). So far, so good.
However, since Windows Vista, Windows is bundled with just few "core" certificates in the store (less than 20, IIRC), and whenever the CryptoAPI is asked to validate a certificate for which it cannot find a trusted root in the local store, the Microsoft servers are contacted to check if they have a trusted root for it. If so, the root certificate is provided and gets automatically installed to the system certificates store.
Unfortunately, Python doesn't use Windows SChannel/CryptoAPI, so it cannot benefit from this automatic mechanism; instead, it asks for all the certificates in the system certificates store and tries to use them - but this means that all it is getting is the handful of certificates shipped with Windows, the manually-installed certificates, plus all the certificates that happened to have been installed automatically, typically when browsing the Internet with Internet Explorer or Edge.
This makes the issue particularly insidious, as the sites which will exhibit a problem will vary between different machines (depending mostly on their browsing history!), and will generally disappear (for that site, and all sites depending from its same root certificate) if you check if you can connect to the site through a browser using SChannel. New Windows installations, build machines and servers in general (which do not see much interactive Internet browsing) for this reason are particularly subject to this problem, while developers may never encounter this issue on their "normal" desktop machines.
How to fix this? Unfortunately, there's no simple solution.
for simple cases, such as a CI server, where some tests needs to access some specific domains that pretty much never changes, a trivial workaround can be to open Internet Explorer and open a page on such domains. This will make it fetch the needed root certificate to the local certificates store, and Python won't have problems with it until it expires (notice: we are talking about the root certificate here, which generally has a duration of many years); on modern Windows versions that ship by default a curl
version that uses SChannel as SSL backend, it can be used as well
you can disable certificate validation tout-court; this has been already covered on in many different answers, such as this. However, this is generally undesirable, as you are giving up the MITM protection provided by SSL;
you may manually install all the currently trusted root certificates to the Windows certificate store; here is a site that explains how (disclaimer: the explained procedure looks sensible, but I never tried it); unfortunately, it's a manual procedure and you would need to repeat it periodically to make sure you get the new root certificates;
you may install the certifi
package, which provides its own certificate store (IIRC it's a copy of the Mozilla certificate store); you can then use it like this:
import certifi
import urllib.request
r = urllib.request.urlopen(url_website, cafile=certifi.where())
This is the road taken by the popular requests
module, which indeed generally works "out of the box"; unfortunately, this is yet another certificate store, which has to be kept updated, so you have to make sure to periodically update the certifi
package through pip
or however you installed it.
Many thanks to the author of this blog article, that was the first that I managed to find that explained correctly this issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With