Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urllib cannot read https

(Python 3.4.2) Would anyone be able to help me fetch https pages with urllib? I've spent hours trying to figure this out.

Here's what I'm trying to do (pretty basic):

import urllib.request
url = "".join((baseurl, other_string, midurl, query))
response = urllib.request.urlopen(url)
html = response.read()

Here's my error output when I run it:

File "./script.py", line 124, in <module>
    response = urllib.request.urlopen(url)
  File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 455, in open
    response = self._open(req, data)
  File "/usr/lib/python3.4/urllib/request.py", line 478, in _open
    'unknown_open', req)
  File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 1244, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: 'https>

I've also tried using data=None to no avail:

response = urllib.request.urlopen(url, data=None)

I've also tried this:

import urllib.request, ssl
https_sslv3_handler = urllib.request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_SSLv3))
opener = urllib.request.build_opener(https_sslv3_handler)
urllib.request.install_opener(opener)
resp = opener.open(url)
html = resp.read().decode('utf-8')
print(html)

A similar error occurs with this^ script, where the error is found on the "resp = ..." line and complains that 'https' is an unknown url type.

Python was compiled with SSL support on my computer (Arch Linux). I've tried reinstalling python3 and openssl a few times, but that doesn't help. I haven't tried to uninstall python completely and then reinstall because I would also need to uninstall a lot of other programs on my computer.

Anyone know what's going on?

-----EDIT-----

I figured it out, thanks to help from Andrew Stevlov's answer. My url had a ":" in it, and I guess urllib didn't like that. I replaced it with "%3A" and now it's working. Thanks so much guys!!!

like image 490
GreenRaccoon23 Avatar asked Nov 29 '14 23:11

GreenRaccoon23


2 Answers

Double check your compilation options, looks like something is wrong with your box.

At least the following code works for me:

from urllib.request import urlopen
resp = urlopen('https://github.com')
print(resp.read())
like image 119
Andrew Svetlov Avatar answered Sep 28 '22 08:09

Andrew Svetlov


this may help

Ignore SSL certificate errors

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
like image 36
Mohamed Mahdi Avatar answered Sep 28 '22 10:09

Mohamed Mahdi