Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urllib.quote() throws KeyError

To encode the URI, I used urllib.quote("schönefeld") but when some non-ascii characters exists in string, it thorws

KeyError: u'\xe9'
Code: return ''.join(map(quoter, s))

My input strings are köln, brønshøj, schönefeld etc.

When I tried just printing statements in windows(Using python2.7, pyscripter IDE). But in linux it raises exception (I guess platform doesn't matter).

This is what I am trying:

from commands import getstatusoutput
queryParams = "schönefeld";
cmdString = "http://baseurl" + quote(queryParams)
print getstatusoutput(cmdString)

Exploring the issue reason: in urllib.quote(), actually exception being throwin at return ''.join(map(quoter, s)).

The code in urllib is:

def quote(s, safe='/'):
    if not s:
        if s is None:
            raise TypeError('None object cannot be quoted')
        return s
     cachekey = (safe, always_safe)
     try:
         (quoter, safe) = _safe_quoters[cachekey]
     except KeyError:
         safe_map = _safe_map.copy()
         safe_map.update([(c, c) for c in safe])
         quoter = safe_map.__getitem__
         safe = always_safe + safe
         _safe_quoters[cachekey] = (quoter, safe)
      if not s.rstrip(safe):
         return s
      return ''.join(map(quoter, s))

The reason for exception is in ''.join(map(quoter, s)), for every element in s, quoter function will be called and finally the list will be joined by '' and returned.

For non-ascii char è, the equivalent key will be %E8 which presents in _safe_map variable. But when I am calling quote('è'), it searches for the key \xe8. So that the key does not exist and exception thrown.

So, I just modifed s = [el.upper().replace("\\X","%") for el in s] before calling ''.join(map(quoter, s)) within try-except block. Now it works fine.

But I am annoying what I have done is correct approach or it will create any other issue? And also I do have 200+ instances of linux which is very tough to deploy this fix in all instances.

like image 418
Garfield Avatar asked Feb 27 '13 15:02

Garfield


3 Answers

You are trying to quote Unicode data, so you need to decide how to turn that into URL-safe bytes.

Encode the string to bytes first. UTF-8 is often used:

>>> import urllib
>>> urllib.quote(u'sch\xe9nefeld')
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py:1268: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  return ''.join(map(quoter, s))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1268, in quote
    return ''.join(map(quoter, s))
KeyError: u'\xe9'
>>> urllib.quote(u'sch\xe9nefeld'.encode('utf8'))
'sch%C3%A9nefeld'

However, the encoding depends on what the server will accept. It's best to stick to the encoding the original form was sent with.

like image 147
Martijn Pieters Avatar answered Nov 02 '22 12:11

Martijn Pieters


By just converting the string to unicode I resolved the issue.

here is the snippet:

try:
    unicode(mystring, "ascii")
except UnicodeError:
    mystring = unicode(mystring, "utf-8")
else:
    pass

Detailed description of solution can be found at http://effbot.org/pyfaq/what-does-unicodeerror-ascii-decoding-encoding-error-ordinal-not-in-range-128-mean.htm

like image 29
Garfield Avatar answered Nov 02 '22 14:11

Garfield


I had the exact same error as @underscore but in my case the problem was that map(quoter,s) tried to look for the key u'\xe9' which was not in the _safe_map. However \xe9 was, so I solved the issue by replacing u'\xe9' by \xe9 in s.

Moreover, shouldn't the return statement be within the try/except ? I also had to change this to completely solve the problem.

like image 1
Sebastien Avatar answered Nov 02 '22 14:11

Sebastien