I'm getting the error invalid byte sequence in UTF-8
when trying to import a CSV file in my Rails application. Everything was working fine until I added a gsub
method to compare one of the CSV columns to a field in my database.
When I import a CSV file, I want to check whether the address for each row is included in an array of different addresses for a specific client. I have a client model with an alt_addresses
property which contains a few different possible formats for the client's address.
I then have a citation model (if you're familiar with local SEO you'll know this term). The citation model doesn't have an address field, but it has a nap_correct?
field (NAP stands for "Name", "Address", "Phone Number"). If the name, address, and phone number for a CSV row is equivalent to what I have in the database for that client, the nap_correct?
field for that citation gets set to "correct".
Here's what the import
method looks like in my citation model:
def self.import(file, client_id)
@client = Client.find(client_id)
CSV.foreach(file.path, headers: true) do |row|
@row = row.to_hash
@citation = Citation.new
if @row["Address"]
if @client.alt_addresses.include?(@row["Address"].to_s.downcase.gsub(/\W+/, '')) && self.phone == @row["Phone Number"].gsub(/[^0-9]/, '')
@citation.nap_correct = true
end
end
@citation.name = @row["Domain"]
@citation.listing_url = @row["Citation Link"]
@citation.save
end
end
And then here's what the alt_addresses
property looks like in my client model:
def alt_addresses
address = self.address.downcase.gsub(/\W+/, '')
address_with_zip = (self.address + self.zip_code).downcase.gsub(/\W+/, '')
return [address, address_with_zip]
end
I'm using gsub
to reformat the address column in the CSV as well as the field in my client database table so I can compare the two values. This is where the problem comes in. As soon as I added the gsub
method I started getting the invalid byte-sequence error.
I'm using Ruby 2.1.3. I've noticed a lot of the similar errors I find searching Stack Overflow are related to an older version of Ruby.
Why does an UTF-8 invalid byte sequence error happen? Ruby's default encoding since 2.0 is UTF-8. This means that Ruby will treat any string you input as an UTF-8 encoded string unless you tell it explicitly that it's encoded differently.
Illegal quoting on lineThis error is caused when there is an illegal character in the CSV file that you are trying to import. To fix this, remember that your CSV file must be UTF-8 encoded. Sometimes, this error is caused by a missing or stray quote.
Specify the encoding with encoding
option:
CSV.foreach(file.path, headers: true, encoding: 'iso-8859-1:utf-8') do |row|
# your code here
end
One way I've figured out to get around this is to "Save As" on open office or libre office and then click "Edit Filter Settings", then make sure the character set is UTF-8 and save. Bottom line, use some external tool to convert the characters to utf-8 compatible characters before loading it into ruby. This issue can be a true f-ing labyrinth within ruby alone
A unix tool called iconv can apparently do this sort of thing. https://superuser.com/questions/588048/is-there-any-tools-which-can-convert-any-strings-to-utf-8-encoded-values-in-linu
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With