How to fetch a non-ascii url with urlopen?

Tags:

I need to fetch data from a URL with non-ascii characters but urllib2.urlopen refuses to open the resource and raises:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u0131' in position 26: ordinal not in range(128)

I know the URL is not standards compliant but I have no chance to change it.

What is the way to access a resource pointed by a URL containing non-ascii characters using Python?

edit: In other words, can / how urlopen open a URL like:

http://example.org/Ñöñ-ÅŞÇİİ/

376

asked Dec 08 '10 16:12

onurmatik

1 Answers

Strictly speaking URIs can't contain non-ASCII characters; what you have there is an IRI.

To convert an IRI to a plain ASCII URI:

non-ASCII characters in the hostname part of the address have to be encoded using the Punycode-based IDNA algorithm;
non-ASCII characters in the path, and most of the other parts of the address have to be encoded using UTF-8 and %-encoding, as per Ignacio's answer.

So:

import re, urlparse  def urlEncodeNonAscii(b):     return re.sub('[\x80-\xFF]', lambda c: '%%%02x' % ord(c.group(0)), b)  def iriToUri(iri):     parts= urlparse.urlparse(iri)     return urlparse.urlunparse(         part.encode('idna') if parti==1 else urlEncodeNonAscii(part.encode('utf-8'))         for parti, part in enumerate(parts)     )  >>> iriToUri(u'http://www.a\u0131b.com/a\u0131b') 'http://www.xn--ab-hpa.com/a%c4%b1b'

(Technically this still isn't quite good enough in the general case because urlparse doesn't split away any user:pass@ prefix or :port suffix on the hostname. Only the hostname part should be IDNA encoded. It's easier to encode using normal urllib.quote and .encode('idna') at the time you're constructing a URL than to have to pull an IRI apart.)

answered Oct 08 '22 12:10

bobince

Related questions
                            
                                Python def function: How do you specify the end of the function?
                            
                                Tracking progress of joblib.Parallel execution
                            
                                How to show a pandas dataframe into a existing flask html table?
                            
                                Convert numpy type to python
                            
                                Error 111 connecting to localhost:6379. Connection refused. Django Heroku
                            
                                Python-redis keys() returns list of bytes objects instead of strings
                            
                                Fastest way to swap elements in Python list
                            
                                Problems using psycopg2 on Mac OS (Yosemite)
                            
                                python 3 try-except all with error [duplicate]
                            
                                Is close() necessary when using iterator on a Python file object [duplicate]
                            
                                How to create a delayed queue in RabbitMQ?
                            
                                Get a list of all installed applications in Django and their attributes
                            
                                how to add annotate data in django-rest-framework queryset responses?
                            
                                python: scatter plot logarithmic scale
                            
                                Page not found 404 Django media files
                            
                                Selenium testing without browser
                            
                                Check if all values of iterable are zero
                            
                                Python pandas: how to remove nan and -inf values
                            
                                How can I create stacked line graph with matplotlib?
                            
                                Most Pythonic way to concatenate strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to fetch a non-ascii url with urlopen?

Tags:

python

unicode

urllib2

urlopen

non-ascii-characters

onurmatik

People also ask

1 Answers

bobince

Recent Activity

Donate For Us