Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding characters with ISO 8859-1 in Python

With ord(ch) you can get a numerical code for character ch up to 127. Is there any function that returns a number from 0-255, so to cover also ISO 8859-1 characters?
Edit: Follows my last version of code and error I get

#!/usr/bin/python
# coding: iso-8859-1

import sys
reload(sys)
sys.setdefaultencoding('iso-8859-1')
print sys.getdefaultencoding()  # prints "iso-8859-1" 

def char_code(c):
    return ord(c.encode('iso-8859-1'))
print char_code(u'à')

I get an error: TypeError: ord() expected a character, but string of length 2 found

like image 953
Drimades Boy Avatar asked Aug 20 '15 16:08

Drimades Boy


1 Answers

When you're starting with a Unicode string, you need to encode rather than decode.

>>> def char_code(c):
        return ord(c.encode('iso-8859-1'))

>>> print char_code(u'à')
224

For ISO-8859-1 in particular, you don't even need to encode it at all, since Unicode uses the ISO-8859-1 characters for its first 256 code points.

>>> print ord(u'à')
224

Edit: I see the problem now. You've given a source code encoding comment that indicates the source is in ISO-8859-1. However, I'll bet that your editor is actually working in UTF-8. The source code will be mis-interpreted, and the single-character string you think you created will actually be two characters. Try the following to see:

print len(u'à')

If your encoding is correct, it will return 1, but in your case it's probably 2.

like image 101
Mark Ransom Avatar answered Sep 24 '22 23:09

Mark Ransom