Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to export UTF8 data into Excel?

So we have this web app where we support UTF8 data. Hooray UTF8. And we can export the user-supplied data into CSV no problem - it's still in UTF8 at that point. The problem is when you open a typical UTF8 CSV up in Excel, it reads it as ANSII encoded text, and accordingly tries to read two-byte chars like ø and ü as two separate characters and you end up with fail.

So I've done a bit of digging (the Intervals folks have a interesting post about it here), and there are some limited if ridiculously annoying options out there. Among them:

  • supplying a UTF-16 Little Endian TSV file which Excel will interpret correctly, but which won't support multi-line data
  • supplying the data in an HTML table with an Excel mime-type or file extension (not sure if this option supports UTF8)
  • there are some three or four ways to get XML data into the various recent versions of excel, and those would support UTF8, in theory. SpreadsheetML, using custom XSLT, or generating the new Excel XML format via templating.

It looks like no matter what, I'm probably going to want to continue offering a plain-old CSV file for the folks who aren't using it for Excel anyway, and a separate download option for Excel.

What's the simplest way of generating that Just-For-Excel file that will correctly support UTF8, my dear Stack Overflowers? If that simplest option only supports the latest version of Excel, that's still of interest.

I'm doing this on a Rails stack, but curious how the .Net-ers and folks on any frameworks handle this. I work in a few different environments myself and this is definitely an issue that will becoming up again.

Update 2010-10-22: We had been using the Ruport gem in our time-tracking system Tempo to provide the CSV exports when I first posted this question. One of my coworkers, Erik Hollensbee, threw together a quick filter for Ruport to provide us with actual Excel XSL output, and I figured I'd share that here for any other ruby-ists:

require 'rubygems' require 'ruport' require 'spreadsheet' require 'stringio'  Spreadsheet.client_encoding = "UTF-8"  include Ruport::Data  class Ruport::Formatter::Excel < Ruport::Formatter   renders :excel, :for => Ruport::Controller::Table    def output     retval = StringIO.new      if options.workbook       book = options.workbook     else       book = Spreadsheet::Workbook.new     end      if options.worksheet_name       book_args = { :name => options.worksheet_name }     else       book_args = { }     end      sheet = book.create_worksheet(book_args)      offset = 0      if options.show_table_headers       sheet.row(0).default_format = Spreadsheet::Format.new(         options.format_options ||          {            :color => :blue,           :weight => :bold,           :size => 18         }       )       sheet.row(0).replace data.column_names       offset = 1     end      data.data.each_with_index do |row, i|       sheet.row(i+offset).replace row.attributes.map { |x| row.data[x] }     end      book.write retval     retval.seek(0)     return retval.read   end end 
like image 796
Billy Gray Avatar asked Jan 16 '09 19:01

Billy Gray


People also ask

How do I save a tab delimited UTF-8 encoded File in Excel?

Click File > Save As. You will see the Save dialog box. Via the File Format dropdown menu, select the CSV UTF-8 option. Click Save.


2 Answers

I found that if you set the charset encoding of the web page to utf-8, and then Response.BinaryWrite the UTF-8 Byte Order Mark (0xEF 0xBB 0xBF) at the top of the csv file, then Excel 2007 (not sure about other versions) will recognize it as utf-8 and open it correctly.

like image 51
Andrew Csontos Avatar answered Sep 20 '22 00:09

Andrew Csontos


After struggling with the same problem for a few hours I found this excellent post on the subject

http://blog.plataformatec.com.br/2009/09/exporting-data-to-csv-and-excel-in-your-rails-app/ quote :

So, these are the three rules for dealing with Excel-friendly-CSV:

  1. Use tabulations, not commas.
  2. Fields must NOT contain newlines.
  3. Use UTF-16 Little Endian to send the file to the user. And include a Little Endian BOM manually.

However, if you're using ruby, you problem is solved: first you have the FasterCSV gem

but i ended up using the spreadsheet gem which directly generates excell spreadsheets (i have link limitation, just google spreadsheet + rubyforge) Brilliant !

like image 29
Alexis Perrier Avatar answered Sep 22 '22 00:09

Alexis Perrier