Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python xlrd convert xlsx to csv

Tags:

python

xlrd

I am trying to convert excel files to csv files using xlrd library.

But I got this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 2: ordinal not in range(128)

Can it be because the excel file is too large? Cause everything works fine with excel files that have small number of rows. But when I tried to convert an excel file that has nearly 2000 rows, I got this error.

[UPDATE]

This is the code:

filepath = './attachments'
wb = xlrd.open_workbook(os.path.join(filepath, 'result.xls'))
sheet = wb.sheet_by_index(0)
fp = open(os.path.join(filepath, 'result.csv'), 'wb')
wr = csv.writer(fp, quoting=csv.QUOTE_ALL)
for rownum in xrange(sheet.nrows):
  wr.writerow(sheet.row_values(rownum))
fp.close()

And this is the traceback:

Traceback (most recent call last):
  File "MethodTest.py", line 11, in <module>
    wr.writerow(sheet.row_values(rownum))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0142' in position 2: ordinal not in range(128)
like image 611
Cacheing Avatar asked Aug 07 '13 20:08

Cacheing


1 Answers

You are reading a Excel sheet with data outside of the ASCII range.

When writing unicode values to a CSV file, automatic encoding takes place, but for values outside the ASCII range of characters that fails. Encode explicitly:

for rownum in xrange(sheet.nrows):
      wr.writerow([unicode(val).encode('utf8') for val in sheet.row_values(rownum)])

On Python 3, the CSV module supports Unicode text, provided you provided the open() command with a suitable encoding and opened the file in text mode. While the default can be UTF-8, it is better to be explicit. The documentation recommends you use newline='' when opening a file for CSV writing:

fp = open(os.path.join(filepath, 'result.csv'), 'w', encoding='utf-8', newline='')
wr = csv.writer(fp, quoting=csv.QUOTE_ALL)
for rownum in range(sheet.nrows):
      wr.writerow(sheet.row_values(rownum))

You may need to pick a different encoding, depending on your needs.

It's also a good idea to use the file object as a context manager to make sure it is closed after the code exits (with or without an exception):

with open(os.path.join(filepath, 'result.csv'), 'w', encoding='utf-8', newline='') as fp:
    wr = csv.writer(fp, quoting=csv.QUOTE_ALL)
    for rownum in range(sheet.nrows):
          wr.writerow(sheet.row_values(rownum))
like image 122
Martijn Pieters Avatar answered Sep 21 '22 06:09

Martijn Pieters