Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to encode csv file in Roo (Rails) : invalid byte sequence in UTF-8

I am trying to upload a csv file but getting invalid byte sequence in UTF-8 error. I am using 'roo' gem.

My code is like this :

def upload_results_csv file

    spreadsheet = MyFileUtil.open_file(file)
    header = spreadsheet.row(1) # THIS LINE RAISES THE ERROR

    (2..spreadsheet.last_row).each do |i|
      row = Hash[[header, spreadsheet.row(i)].transpose]
      ...
      ...
end

class MyFileUtil

  def self.open_file(file)
    case File.extname(file.original_filename)
      when ".csv" then
        Roo::Csv.new(file.path,csv_options: {encoding: Encoding::UTF_8})
      when ".xls" then
        Roo::Excel.new(file.path, nil, :ignore)
      when ".xlsx" then
        Roo::Excelx.new(file.path, nil, :ignore)
      else
        raise "Unknown file type: #{file.original_filename}"
    end
  end

end.

I don't know how to encode csv file. Please help!

Thanks

like image 975
Junaid Avatar asked Mar 11 '14 12:03

Junaid


1 Answers

To safely convert a string to utf-8 you can do:

str.encode('utf-8', 'binary', invalid: :replace, undef: :replace, replace: '')

also see this blog post.

Since the roo gem will only take filenames as constructor argument, not plain IO objects, the only solution I can think of is to write a sanitized version to a tempfile and pass it to roo, along the lines of

require 'tempfile'

def upload_results_csv file
    tmpfile = Tempfile.new(file.path)
    tmpfile.write(File.read(file.path).encode('utf-8', 'binary', invalid: :replace, undef: :replace, replace: ''))
    tmpfile.rewind

    spreadsheet = MyFileUtil.open_file(tmpfile, file.original_filename)
    header = spreadsheet.row(1) # THIS LINE RAISES THE ERROR

    # ...
ensure
    tmpfile.close
    tmpfile.unlink
end

You need to alter MyFileUtil as well, because the original filename needs to be passed down:

class MyFileUtil
  def self.open_file(file, original_filename)
    case File.extname(original_filename)
      when ".csv" then
        Roo::Csv.new(file.path,csv_options: {encoding: Encoding::UTF_8})
      when ".xls" then
        Roo::Excel.new(file.path, nil, :ignore)
      when ".xlsx" then
        Roo::Excelx.new(file.path, nil, :ignore)
      else
        raise "Unknown file type: #{original_filename}"
    end
  end
end
like image 150
Patrick Oscity Avatar answered Oct 07 '22 21:10

Patrick Oscity