(Python 3.4.2) Would anyone be able to help me fetch https pages with urllib? I've spent hours trying to figure this out.
Here's what I'm trying to do (pretty basic):
import urllib.request
url = "".join((baseurl, other_string, midurl, query))
response = urllib.request.urlopen(url)
html = response.read()
Here's my error output when I run it:
File "./script.py", line 124, in <module>
response = urllib.request.urlopen(url)
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 455, in open
response = self._open(req, data)
File "/usr/lib/python3.4/urllib/request.py", line 478, in _open
'unknown_open', req)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 1244, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: 'https>
I've also tried using data=None to no avail:
response = urllib.request.urlopen(url, data=None)
I've also tried this:
import urllib.request, ssl
https_sslv3_handler = urllib.request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_SSLv3))
opener = urllib.request.build_opener(https_sslv3_handler)
urllib.request.install_opener(opener)
resp = opener.open(url)
html = resp.read().decode('utf-8')
print(html)
A similar error occurs with this^ script, where the error is found on the "resp = ..." line and complains that 'https' is an unknown url type.
Python was compiled with SSL support on my computer (Arch Linux). I've tried reinstalling python3 and openssl a few times, but that doesn't help. I haven't tried to uninstall python completely and then reinstall because I would also need to uninstall a lot of other programs on my computer.
Anyone know what's going on?
-----EDIT-----
I figured it out, thanks to help from Andrew Stevlov's answer. My url had a ":" in it, and I guess urllib didn't like that. I replaced it with "%3A" and now it's working. Thanks so much guys!!!
Double check your compilation options, looks like something is wrong with your box.
At least the following code works for me:
from urllib.request import urlopen
resp = urlopen('https://github.com')
print(resp.read())
this may help
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With