Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing unicode strings to Excel 2007

I am connecting to a MS SQL server using pyodbc. Furthermore, I am trying to write to an Excel 2007/10 .xlsx file using openpyxl.

This is my code (Python 2.7):

import pyodbc
from openpyxl import Workbook

cnxn = pyodbc.connect(host = 'xxx',database='yyy',user='zzz',password='ppp')
cursor = cnxn.cursor()

sql = "SELECT TOP 10   [customer clientcode] AS Customer, \
                [customer dchl] AS DChl, \
                [customer name] AS Name, \
                ...
                [name3] AS [name 3] \
        FROM   mydb \
        WHERE [customer dchl] = '03' \
        ORDER BY [customer id] ASC"

#load data
cursor.execute(sql)

#get colnames from openpyxl
columns = [column[0] for column in cursor.description]    

#using optimized_write cause it will be about 120k rows of data
wb = Workbook(optimized_write = True, encoding='utf-8')

ws = wb.create_sheet()
ws.title = '03'

#append column names to header
ws.append(columns)

#append rows to 
for row in cursor:
    ws.append(row)

wb.save(filename = 'test.xlsx')

cnxn.close()

This works, at least up until the point I encounter a customer with, for example, the name: "mún". My code does not fail, everything writes to Excel and all is well. That is until I actually open the Excel file- this causes an error saying that the file is corrupted and needs to be repaired. Upon repairing the file, all data is lost.

I know the code works for customers with regular names (only ASCII), it is as soon as there's an accented character or anything that the Excel file gets corrupted.

I have tried to print a single row (with a difficult cust. name). This is the result:

row is a tuple, and this one of the indices: 'Mee\xf9s Tilburg' So either writing the \xf9 (ú) character causes an error, or MS Excel cannot cope with it. I have tried various ways of encoding a row to unicode (unicode(row,'utf-8') or u''.join(row)) etc., though nothing works. Either I try something idiotic resulting in an error, or the Excel file still errors.

Any ideas?

like image 876
Rym Avatar asked Mar 08 '13 15:03

Rym


People also ask

How do I use unicode text in Excel?

To type it into a cell, hold down the alt key and type 0151 on the number pad. Another way to use Unicode in Excel is to use the Character Map. To open the Character Map, click on the start button and type Character Map. Click on the Character Map icon and select the font you want to use.

What is unicode text format in Excel?

What is the UNICODE Function? The UNICODE Function[1] is categorized under Excel Text functions. It will give the number (code point) for the first character of a supplied text string. The function was introduced in MS Excel 2013.


1 Answers

In the end I found two solutions:

First one was converting the row given by the cursor to a list, and decoding the elements within the list:

for row in cursor:
    l = list(row)
    l[5] = l[5].decode('ISO-8859-1')
    (do this for all neccesary cols)
    ws.append(l)

I figured this would have been hell, cause there were 6 columns needing conversion to unicode, and there were 120k rows, though everything went quite fast actually! In the end it became apparent that I could/should just cast the data in the sql statement to unicode ( cast(x as nvarchar) AS y) which made the replacements unnecessary. I did not think of this at first cause i thought that it was actually supplying the data in unicode. My bad.

like image 197
Rym Avatar answered Oct 03 '22 01:10

Rym