<pre class="prettyprint"><code># -*- coding: utf-8 -*- # Python3 import urllib import urllib.request as url_req opener = url_req.build_opener() url='http://zh.wikipedia.org/wiki/'+"毛泽东" opener.open(url).read() # opener.open(url.encode("utf-8")).read() # # doesn't work either </code></pre> When I run it, it complains that: <code>UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-12: ordinal not in range(128)</code> But I can't use <code>.encode()</code> either as it will complain: <pre class="prettyprint"><code>Traceback (most recent call last): File "t.py", line 8, in <module> opener.open(url.encode("utf-8")).read() File "/usr/local/Cellar/python3/3.2.2/lib/python3.2/urllib/request.py", line 360, in open req.timeout = timeout AttributeError: 'bytes' object has no attribute 'timeout' </code></pre> Anyone knows how to deal with that?

The fantastic requests library does this for you out of the box: <pre class="prettyprint"><code>>>> url='http://zh.wikipedia.org/wiki/'+"毛泽东" >>> import requests >>> r = requests.get(url) >>> len(r.content) 818747 </code></pre>

How to deal with unicode string in URL in python3?

Tags:

python

python-3.x

unicode

# -*- coding: utf-8 -*- 
# Python3
import urllib
import urllib.request as url_req
opener = url_req.build_opener()
url='http://zh.wikipedia.org/wiki/'+"毛泽东"
opener.open(url).read()
# opener.open(url.encode("utf-8")).read()
# # doesn't work either

When I run it, it complains that:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-12: ordinal not in range(128)

But I can't use .encode() either as it will complain:

Traceback (most recent call last):
  File "t.py", line 8, in <module>
    opener.open(url.encode("utf-8")).read()
  File "/usr/local/Cellar/python3/3.2.2/lib/python3.2/urllib/request.py", line 360, in open
    req.timeout = timeout
AttributeError: 'bytes' object has no attribute 'timeout'

Anyone knows how to deal with that?

812

asked Aug 05 '12 17:08

Hanfei Sun

2 Answers

You could use urllib.parse.quote() to encode the path section of URL.

#!/usr/bin/env python3
from urllib.parse   import quote
from urllib.request import urlopen

url = 'http://zh.wikipedia.org/wiki/' + quote("毛泽东")
content = urlopen(url).read()

122

answered Sep 28 '22 04:09

jfs

The fantastic requests library does this for you out of the box:

>>> url='http://zh.wikipedia.org/wiki/'+"毛泽东"
>>> import requests
>>> r = requests.get(url)
>>> len(r.content)
818747

answered Sep 28 '22 05:09

jterrace

Related questions
                            
                                Is there a Python equivalent to `perl -pi -e`?
                            
                                What is the correct (or best) way to subclass the Python set class, adding a new instance variable?
                            
                                Equivalent for inject() in Python?
                            
                                In Django, how do I clear a sessionkey?
                            
                                Python: HTTP Post a large file with streaming
                            
                                best way to implement a deck for a card game in python
                            
                                Using Django view variables inside templates
                            
                                Fastest way to sort in Python
                            
                                Does anyone have good examples of using mutagen to write to files? [closed]
                            
                                python function call with variable
                            
                                Why doesn’t SQLite3 require a commit() call to save data?
                            
                                python, numpy boolean array: negation in where statement
                            
                                Find phase difference between two (inharmonic) waves
                            
                                Match any unicode letter?
                            
                                Django Boolean Queryset Filter Not Working
                            
                                Does Python optimize function calls from loops?
                            
                                my matplotlib title gets cropped
                            
                                financial python library that has xirr and xnpv function?
                            
                                wxPython WebView example
                            
                                Uncaught ReferenceError: django is not defined

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With