I'm getting a "WindowsError: [Error 5] Access is denied" message when reading a website with urllib2.
from urllib2 import urlopen, Request
from bs4 import BeautifulSoup
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
req = Request('https://' + url, headers=hdr)
soup = BeautifulSoup( urlopen( req ).read() )
The full traceback is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 431, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 449, in _open
'_open', req)
File "C:\Python27\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 1240, in https_open
context=self._context)
File "C:\Python27\lib\urllib2.py", line 1166, in do_open
h = http_class(host, timeout=req.timeout, **http_conn_args)
File "C:\Python27\lib\httplib.py", line 1258, in __init__
context = ssl._create_default_https_context()
File "C:\Python27\lib\ssl.py", line 440, in create_default_context
context.load_default_certs(purpose)
File "C:\Python27\lib\ssl.py", line 391, in load_default_certs
self._load_windows_store_certs(storename, purpose)
File "C:\Python27\lib\ssl.py", line 378, in _load_windows_store_certs
for cert, encoding, trust in enum_certificates(storename):
WindowsError: [Error 5] Access is denied
I've tried running the script from a command prompt with admin privileges, as suggested here, but it does not fix the problem.
Any suggestions on how to resolve this error?
It looks like this is a windows certificate store inconsistency. httplib
- which is internally called by urllib2
- recently changed from no server certificate validation to enforce server certificate validation by default. Therefore you'll encounter this problem in any python script that is based on urllib
, httplib
and running within your user profile.
That said, something seems to be very wrong with your windows certificate store. httplib
fails for you while trying to enumerate certificates for the named certificate stores CA
certification authority
(shows up as Intermediate Certification Authorities
in certmgr.msc
) but succeeds for ROOT
which is the normal trusted root certificate store (see comments to question). I'd therefore suggest to check all the certificates in certmgr:intermediate certificate authorities
for recently added certificates and/or the windows log for general errors.
What is going on in your case is that urllib2
internally calls httplib
which then tries to set up a default ssl context with certificate validation enforced and as part of this it enumerates the trusted certificate anchors of your system by calling ssl.enum_certificates
. This function is implemented in C
as _ssl_enum_certificates_impl
and internally calls WINAPIs CertOpenSystemStore
and CertEnumCertificatesInStore
. For the certificate store location CA
it just failes in one of the two winapi calls with an access denied.
If you want to further debug this you can also try to manually invoke the WINAPI:CertOpenSystemStore
with LPTCSTR::'CA'
as an argument and try to debug it from this side, try other windows certstore management tools and/or call microsoft support for asistance.
There are also indications that others had similar problems while interfacing that api call, see google:access denied CertOpenSystemStore
If you just want to make it work without fixing the root cause you could just try to use the following workaround that temporarily patches the _windows_cert_stores
to not include the broken CA
certstore or to completely disable the trust-anchor loading logic. (all other ssl.SSLContext
invocations will be patched in the current process)
Note that this effectively disables server certificate verification.
ssl.SSLContext._windows_cert_stores = ("ROOT",) # patch windows_cert_stores default to only include "ROOT" as "CA" is broken for you.
#ssl.SSLContext.load_default_certs = lambda s,x:None # alternative, fully NOP load_default_certs to do nothing instead.
ctx = ssl.create_default_context() # create new sslcontext, not veryfing any certificates, hostnames.
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
req = Request('https://' + url, headers=hdr)
x = urlopen( req , context=ctx).read()
ssl.SSLContext._windows_cert_stores = ("ROOT","CA") # UNDO PATCH
I hope this information will help you resolve the issue. good luck.
There are several potential problems using the Windows certificate store. (I've found for the case of running your code from a service account without a full user profile, this is near impossible). The reasons are somewhat complex, but not worth discussing further because there is an easier solution. Turning off SSL validation, as already suggested, is one workaround but probably not the best if you care about the validity of the certificates presented.
Just avoid this altogether by using a self-contained cert store. For Python this is the certifi package, which is kept up to date. This is easily accessed from the python requests package. Both should be readily accessible for most common python distributions
import requests
from bs4 import BeautifulSoup
url = "www.google.com"
hdr = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
r = requests.get('https://' + url, headers=hdr, verify=True)
soup = BeautifulSoup(r.text)
Note that requests.get() will throw an exception on invalid addresses, unreachable sites, and failed certificate verification. So you want to be prepared to catch these. When a site was successfully contacted and the certificate was validated, but the page wasn't found (404 error for example), you won't get an exception. So, you should also check to see that r.status_code==200 after making the request. (30x redirects are handled, automatically so you won't see those as status codes unless you tell it to not follow them.) This checking is omitted from the example code for clarity.
Note also that you don't explicitly reference the certifi module here. requests will use it if installed. If not installed, requests will use a more limited built-in set of root CAs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With