Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django create CSV file that contains Unicode and can be opened directly with Excel

I want to create a CSV file through Django that contains unicode data (greek characters) and I want it to be opened directly from MS Excel. Elsewhere I had read about the unicodecsv library and I decided to use that. So, here's my view;

def get_csv(request, id):
    response = HttpResponse(mimetype='text/csv')
    response['Content-Disposition'] = 'attachment; filename=csv.csv'
    writer = unicodecsv.writer(response, encoding='utf-16"')
    writer.writerow(['Second row', 'A', 'B', 'C', '"Testing"', "ελληνικά"])
    return response

Now, except utf-16, I had really tried everything in the encoding parameter of the writer, including utf-8, utf-8-sig, utf-8-le, utf-16-le and maybe others. Everytime I opened the file with excel I always saw garbage where the greek characters should've been.

Notepad++ was able to open the file without problems. What am I doing wrong ?

Update: Here's what I tried after jd's answer:

import csv
response = HttpResponse(mimetype='text/csv')
response['Content-Disposition'] = 'attachment; filename=test.csv'
response.write(u'\ufeff'.encode('utf8'))
writer = csv.writer(response, delimiter=';' , dialect='excel')
writer.writerow(['Second row', 'A', 'B', 'C', '"Testing"', "ελληνικά"])
return response

Still no luck - now I also can see the BOM (as grabage) in Excel - I also tried using unicodecsv and some other options but once again nothign worked :(

Update 2: I tried this after dda's proposal:

writer = unicodecsv.writer(response, delimiter=';' , dialect='excel')
writer.writerow(codecs.BOM_UTF16_LE)
writer.writerow([ (u'ελληνικά').decode('utf8').encode('utf_16_le')])

Still no luck :( Here's the error I get:

UnicodeEncodeError at /csv/559
'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)

Update 3: I am going crazy. Why is it so difficult ??? Here's another try:

response.write(codecs.BOM_UTF16_LE)
writer = unicodecsv.writer(response, delimiter=';' ,  lineterminator='\n', dialect='excel',  )
writer.writerow('ελληνικ')
writer.writerow([ ('ελληνικά').decode('utf8').encode('utf_16_le')]) #A
writer.writerow([ ('ελληνικά2').decode('utf8').encode('utf_16_le'),  ('ελληνικά2').decode('utf8').encode('utf_16_le') ]) #B

and here's the contents of the Excel:

㯎㮵㯎㮻㯎㮻㯎㮷㯎㮽㯎㮹㯎઺ελληνικά딊묃묃뜃봃뤃먃갃㈃딻묃묃뜃봃뤃먃갃㈃

So I got some greek characters with the line #A. But line B, which is exaclty the same didn't produce me greek characters $^#$#^$#$#^ @@%$#^#^$#$ Pls hlep !

like image 339
Serafeim Avatar asked Jun 01 '12 07:06

Serafeim


1 Answers

With Python's csv module, you can write a UTF-8 file that Excel will read correctly if you place a BOM at the beginning of the file.

with open('myfile.csv', 'wb') as f:
    f.write(u'\ufeff'.encode('utf8'))
    writer = csv.writer(f, delimiter=';', lineterminator='\n', quoting=csv.QUOTE_ALL, dialect='excel')
    ...

The same should work with unicodecsv. I suppose you can write the BOM directly to the HttpResponse object, if not you can use StringIO to first write your file.

Edit:

Here's some sample code that writes a UTF-8 CSV file with non-ASCII characters. I'm taking Django out of the equation for simplicity. I can read that file in Excel.

# -*- coding: utf-8 -*-
import csv
import os
response = open(os.path.expanduser('~/utf8_test.csv'), 'wb')
response.write(u'\ufeff'.encode('utf8'))
writer = csv.writer(response, delimiter=';' , dialect='excel')
writer.writerow(['Second row', 'A', 'B', 'C', '"Testing"', u"ελληνικά".encode('utf8')])
response.close()
like image 162
jd. Avatar answered Nov 14 '22 21:11

jd.