Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rails Import CSV Error: invalid byte sequence in UTF-8

I'm getting the error invalid byte sequence in UTF-8 when trying to import a CSV file in my Rails application. Everything was working fine until I added a gsub method to compare one of the CSV columns to a field in my database.

When I import a CSV file, I want to check whether the address for each row is included in an array of different addresses for a specific client. I have a client model with an alt_addresses property which contains a few different possible formats for the client's address.

I then have a citation model (if you're familiar with local SEO you'll know this term). The citation model doesn't have an address field, but it has a nap_correct? field (NAP stands for "Name", "Address", "Phone Number"). If the name, address, and phone number for a CSV row is equivalent to what I have in the database for that client, the nap_correct? field for that citation gets set to "correct".

Here's what the import method looks like in my citation model:

def self.import(file, client_id)
  @client = Client.find(client_id)
  CSV.foreach(file.path, headers: true) do |row|
    @row = row.to_hash
    @citation = Citation.new
    if @row["Address"]
      if @client.alt_addresses.include?(@row["Address"].to_s.downcase.gsub(/\W+/, '')) && self.phone == @row["Phone Number"].gsub(/[^0-9]/, '')
        @citation.nap_correct = true
      end
    end
    @citation.name = @row["Domain"]
    @citation.listing_url = @row["Citation Link"]
    @citation.save
  end
end

And then here's what the alt_addresses property looks like in my client model:

def alt_addresses
  address = self.address.downcase.gsub(/\W+/, '')
  address_with_zip = (self.address + self.zip_code).downcase.gsub(/\W+/, '')
  return [address, address_with_zip]
end

I'm using gsub to reformat the address column in the CSV as well as the field in my client database table so I can compare the two values. This is where the problem comes in. As soon as I added the gsub method I started getting the invalid byte-sequence error.

I'm using Ruby 2.1.3. I've noticed a lot of the similar errors I find searching Stack Overflow are related to an older version of Ruby.

like image 610
Eli Avatar asked Oct 10 '15 23:10

Eli


People also ask

What is invalid byte sequence in UTF-8?

Why does an UTF-8 invalid byte sequence error happen? Ruby's default encoding since 2.0 is UTF-8. This means that Ruby will treat any string you input as an UTF-8 encoded string unless you tell it explicitly that it's encoded differently.

Can't process the CSV illegal quoting in line?

Illegal quoting on lineThis error is caused when there is an illegal character in the CSV file that you are trying to import. To fix this, remember that your CSV file must be UTF-8 encoded. Sometimes, this error is caused by a missing or stray quote.


2 Answers

Specify the encoding with encoding option:

CSV.foreach(file.path, headers: true, encoding: 'iso-8859-1:utf-8') do |row|
 # your code here
end
like image 169
K M Rakibul Islam Avatar answered Oct 13 '22 18:10

K M Rakibul Islam


One way I've figured out to get around this is to "Save As" on open office or libre office and then click "Edit Filter Settings", then make sure the character set is UTF-8 and save. Bottom line, use some external tool to convert the characters to utf-8 compatible characters before loading it into ruby. This issue can be a true f-ing labyrinth within ruby alone

A unix tool called iconv can apparently do this sort of thing. https://superuser.com/questions/588048/is-there-any-tools-which-can-convert-any-strings-to-utf-8-encoded-values-in-linu

like image 27
boulder_ruby Avatar answered Oct 13 '22 17:10

boulder_ruby