I have a Python 2.6 script that is gagging on special characters, encoded in Latin-1, that I am retrieving from a SQL Server database. I would like to print these characters, but I'm somewhat limited because I am using a library that calls the <code>unicode</code> factory, and I don't know how to make Python use a codec other than <code>ascii</code>. The script is a simple tool to return lookup data from a database without having to execute the SQL directly in a SQL editor. I use the PrettyTable 0.5 library to display the results. The core of the script is this bit of code. The tuples I get from the cursor contain integer and string data, and no Unicode data. (I'd use <code>adodbapi</code> instead of <code>pyodbc</code>, which would get me Unicode, but <code>adodbapi</code> gives me other problems.) <pre class="prettyprint"><code>x = pyodbc.connect(cxnstring) r = x.cursor() r.execute(sql) t = PrettyTable(columns) for rec in r: t.add_row(rec) r.close() x.close() t.set_field_align("ID", 'r') t.set_field_align("Name", 'l') print t </code></pre> But the <code>Name</code> column can contain characters that fall outside the ASCII range. I'll sometimes get an error message like this, in line 222 of <code>prettytable.pyc</code>, when it gets to the <code>t.add_row</code> call: <pre class="prettyprint"><code>UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 12: ordinal not in range(128) </code></pre> This is line 222 in <code>prettytable.py</code>. It uses <code>unicode</code>, which is the source of my problems, and not just in this script, but in other Python scripts that I have written. <pre class="prettyprint"><code>for i in range(0,len(row)): if len(unicode(row[i])) > self.widths[i]: # This is line 222 self.widths[i] = len(unicode(row[i])) </code></pre> Please tell me what I'm doing wrong here. How can I make <code>unicode</code> work without hacking <code>prettytable.py</code> or any of the other libraries that I use? Is there even a way to do this? EDIT: The error occurs not at the <code>print</code> statement, but at the <code>t.add_row</code> call. EDIT: With Bastien Léonard's help, I came up with the following solution. It's not a panacea, but it works. <pre class="prettyprint"><code>x = pyodbc.connect(cxnstring) r = x.cursor() r.execute(sql) t = PrettyTable(columns) for rec in r: urec = [s.decode('latin-1') if isinstance(s, str) else s for s in rec] t.add_row(urec) r.close() x.close() t.set_field_align("ID", 'r') t.set_field_align("Name", 'l') print t.get_string().encode('latin-1') </code></pre> I ended up having to decode on the way in and encode on the way out. All of this makes me hopeful that everybody ports their libraries to Python 3.x sooner than later!

Add this at the beginning of the module: <pre class="prettyprint"><code># coding: latin1 </code></pre> Or decode the string to Unicode yourself. [Edit] It's been a while since I played with Unicode, but hopefully this example will show how to convert from Latin1 to Unicode: <pre class="prettyprint"><code>>>> s = u'ééé'.encode('latin1') # a string you may get from the database >>> s.decode('latin1') u'\xe9\xe9\xe9' </code></pre> [Edit] Documentation: http://docs.python.org/howto/unicode.html http://docs.python.org/library/codecs.html

Maybe try to decode the latin1-encoded strings into unicode? <pre class="prettyprint"><code>t.add_row((value.decode('latin1') for value in rec)) </code></pre>

Latin-1 and the unicode factory in Python

Tags:

python

unicode

I have a Python 2.6 script that is gagging on special characters, encoded in Latin-1, that I am retrieving from a SQL Server database. I would like to print these characters, but I'm somewhat limited because I am using a library that calls the unicode factory, and I don't know how to make Python use a codec other than ascii.

The script is a simple tool to return lookup data from a database without having to execute the SQL directly in a SQL editor. I use the PrettyTable 0.5 library to display the results.

The core of the script is this bit of code. The tuples I get from the cursor contain integer and string data, and no Unicode data. (I'd use adodbapi instead of pyodbc, which would get me Unicode, but adodbapi gives me other problems.)

Click to copy

x = pyodbc.connect(cxnstring)
r = x.cursor()
r.execute(sql)

t = PrettyTable(columns)
for rec in r:
    t.add_row(rec)
r.close()
x.close()

t.set_field_align("ID", 'r')
t.set_field_align("Name", 'l')
print t

But the Name column can contain characters that fall outside the ASCII range. I'll sometimes get an error message like this, in line 222 of prettytable.pyc, when it gets to the t.add_row call:

Click to copy

UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 12: ordinal not in range(128)

This is line 222 in prettytable.py. It uses unicode, which is the source of my problems, and not just in this script, but in other Python scripts that I have written.

Click to copy

for i in range(0,len(row)):
    if len(unicode(row[i])) > self.widths[i]:   # This is line 222
        self.widths[i] = len(unicode(row[i]))

Please tell me what I'm doing wrong here. How can I make unicode work without hacking prettytable.py or any of the other libraries that I use? Is there even a way to do this?

EDIT: The error occurs not at the print statement, but at the t.add_row call.

EDIT: With Bastien Léonard's help, I came up with the following solution. It's not a panacea, but it works.

Click to copy

x = pyodbc.connect(cxnstring)
r = x.cursor()
r.execute(sql)

t = PrettyTable(columns)
for rec in r:
    urec = [s.decode('latin-1') if isinstance(s, str) else s for s in rec]
    t.add_row(urec)
r.close()
x.close()

t.set_field_align("ID", 'r')
t.set_field_align("Name", 'l')
print t.get_string().encode('latin-1')

I ended up having to decode on the way in and encode on the way out. All of this makes me hopeful that everybody ports their libraries to Python 3.x sooner than later!

320

asked Jul 20 '09 20:07

eksortso

2 Answers

Add this at the beginning of the module:

Click to copy

# coding: latin1

Or decode the string to Unicode yourself.

[Edit]

It's been a while since I played with Unicode, but hopefully this example will show how to convert from Latin1 to Unicode:

Click to copy

>>> s = u'ééé'.encode('latin1') # a string you may get from the database
>>> s.decode('latin1')
u'\xe9\xe9\xe9'

[Edit]

Documentation:
http://docs.python.org/howto/unicode.html
http://docs.python.org/library/codecs.html

113

answered Oct 16 '22 00:10

Bastien Léonard

Maybe try to decode the latin1-encoded strings into unicode?

Click to copy

t.add_row((value.decode('latin1') for value in rec))

answered Oct 16 '22 00:10

liori

Related questions
                            
                                Difference in Python thread.join() between Python 3.7 and 3.8
                            
                                torchtext ImportError in colab
                            
                                Google API OAuth 2 sign in something went wrong with new OAuth 2 client
                            
                                How to add individual vlines to every subplot of seaborn FacetGrid
                            
                                Printing Webpage in a Specific Location in Selenium
                            
                                Python requests not redirecting
                            
                                Can't scrape product title from a webpage
                            
                                django RuntimeError at /admin/users/user/1/change/, Single thread executor already being used, would deadlock
                            
                                ImportError: cannot import name 'url' from 'django.conf.urls' after upgrading to Django 4.0
                            
                                Good python library for generating audio files? [closed]
                            
                                Getting international characters from a web page? [duplicate]
                            
                                Python: DISTINCT on GQuery result set (GQL, GAE)
                            
                                Split HTML after N words in python
                            
                                Django: Perform case-insensitive lookups by default
                            
                                File handling in Django when posting image from service call
                            
                                box drawing in python
                            
                                How do you deploy django applications for windows? [closed]
                            
                                Python "Event" equivalent in Java?
                            
                                using pyodbc on ubuntu to insert a image field on SQL Server
                            
                                Are there any class diagram generating tools for python source code?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Latin-1 and the unicode factory in Python

Tags:

python

unicode

eksortso

People also ask

2 Answers

Bastien Léonard

liori

Recent Activity

Donate For Us