Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urllib.request.urlopen: SSL: CERTIFICATE_VERIFY_FAILED Error on Windows >=Vista (7/8/10/Server 2008) on Python >=3.4

Trying to use Python 3 urlopen on many HTTPS sites on recent (>=Vista) Windows machines I get "SSL: CERTIFICATE_VERIFY_FAILED" errors when trying to do an urllib.request.urlopen on many sites (on some build machines even https://www.google.com/, but curiously never on https://www.microsoft.com/).

>>> import urllib.request
>>> urllib.request.urlopen("https://www.google.com/")
Traceback (most recent call last):
  File "C:\Python35\lib\urllib\request.py", line 1254, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "C:\Python35\lib\http\client.py", line 1106, in request
    self._send_request(method, url, body, headers)
  File "C:\Python35\lib\http\client.py", line 1151, in _send_request
    self.endheaders(body)
  File "C:\Python35\lib\http\client.py", line 1102, in endheaders
    self._send_output(message_body)
  File "C:\Python35\lib\http\client.py", line 934, in _send_output
    self.send(msg)
  File "C:\Python35\lib\http\client.py", line 877, in send
    self.connect()
  File "C:\Python35\lib\http\client.py", line 1260, in connect
    server_hostname=server_hostname)
  File "C:\Python35\lib\ssl.py", line 377, in wrap_socket
    _context=self)
  File "C:\Python35\lib\ssl.py", line 752, in __init__
    self.do_handshake()
  File "C:\Python35\lib\ssl.py", line 988, in do_handshake
    self._sslobj.do_handshake()
  File "C:\Python35\lib\ssl.py", line 633, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c
:645)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python35\lib\urllib\request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python35\lib\urllib\request.py", line 466, in open
    response = self._open(req, data)
  File "C:\Python35\lib\urllib\request.py", line 484, in _open
    '_open', req)
  File "C:\Python35\lib\urllib\request.py", line 444, in _call_chain
    result = func(*args)
  File "C:\Python35\lib\urllib\request.py", line 1297, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\Python35\lib\urllib\request.py", line 1256, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certifica
te verify failed (_ssl.c:645)>

Most infuriatingly, this happens almost only on the build/CI servers, and often these errors disappear after trying to investigate the issue (e.g. checking the connectivity to the given site, which responds correctly when tried through a browser):

>>> import urllib.request
>>> urllib.request.urlopen("https://www.google.com/")
<http.client.HTTPResponse object at 0x0000000002D930B8>

I heard many suggestions about disabling the certificate validation by messing with SSL contexts, but I'd like to avoid this - I want to keep my HTTPS security intact!

What could be the cause of this issue? How can I fix it?

like image 483
Matteo Italia Avatar asked Mar 06 '23 13:03

Matteo Italia


1 Answers

Unfortunately, it's a sad story still without a happy ending, and is detailed in https://bugs.python.org/issue20916.

Python 3.3 added the cadefault parameter to urllib.request.urlopen, defaulting to True (https://bugs.python.org/issue14780), which made HTTPS requests validate the server certificates using the system certificates store by default.

Python 3.4 made SSLContext.set_default_verify_paths kind-of-work on Windows (https://bugs.python.org/issue19292), enabling Python to use the Windows certificate store.

Previously, Microsoft pushed root certificates updates through Windows Update, which ensured that the system root certificates store was always updated (as long as the user installed the updates). So far, so good.

However, since Windows Vista, Windows is bundled with just few "core" certificates in the store (less than 20, IIRC), and whenever the CryptoAPI is asked to validate a certificate for which it cannot find a trusted root in the local store, the Microsoft servers are contacted to check if they have a trusted root for it. If so, the root certificate is provided and gets automatically installed to the system certificates store.

Unfortunately, Python doesn't use Windows SChannel/CryptoAPI, so it cannot benefit from this automatic mechanism; instead, it asks for all the certificates in the system certificates store and tries to use them - but this means that all it is getting is the handful of certificates shipped with Windows, the manually-installed certificates, plus all the certificates that happened to have been installed automatically, typically when browsing the Internet with Internet Explorer or Edge.

This makes the issue particularly insidious, as the sites which will exhibit a problem will vary between different machines (depending mostly on their browsing history!), and will generally disappear (for that site, and all sites depending from its same root certificate) if you check if you can connect to the site through a browser using SChannel. New Windows installations, build machines and servers in general (which do not see much interactive Internet browsing) for this reason are particularly subject to this problem, while developers may never encounter this issue on their "normal" desktop machines.


How to fix this? Unfortunately, there's no simple solution.

  • for simple cases, such as a CI server, where some tests needs to access some specific domains that pretty much never changes, a trivial workaround can be to open Internet Explorer and open a page on such domains. This will make it fetch the needed root certificate to the local certificates store, and Python won't have problems with it until it expires (notice: we are talking about the root certificate here, which generally has a duration of many years); on modern Windows versions that ship by default a curl version that uses SChannel as SSL backend, it can be used as well

    screenshot showing the workaround in action: first Python fails to connect due to SSL error, then a curl request is done, then Python works correctly

  • you can disable certificate validation tout-court; this has been already covered on in many different answers, such as this. However, this is generally undesirable, as you are giving up the MITM protection provided by SSL;

  • you may manually install all the currently trusted root certificates to the Windows certificate store; here is a site that explains how (disclaimer: the explained procedure looks sensible, but I never tried it); unfortunately, it's a manual procedure and you would need to repeat it periodically to make sure you get the new root certificates;

  • you may install the certifi package, which provides its own certificate store (IIRC it's a copy of the Mozilla certificate store); you can then use it like this:

      import certifi
      import urllib.request
      r = urllib.request.urlopen(url_website, cafile=certifi.where())
    

This is the road taken by the popular requests module, which indeed generally works "out of the box"; unfortunately, this is yet another certificate store, which has to be kept updated, so you have to make sure to periodically update the certifi package through pip or however you installed it.


Many thanks to the author of this blog article, that was the first that I managed to find that explained correctly this issue.

like image 107
Matteo Italia Avatar answered Mar 11 '23 21:03

Matteo Italia